If you’re already familiar with SQL then you may well be thinking about how to add Hadoop skills to your toolbelt as an option for data processing.
From a querying perspective, using Apache Hive provides a familiar interface to data held in a Hadoop cluster and is a great way to get started. Apache Hive is data warehouse infrastructure built on top of Apache Hadoop for providing data summarization, ad-hoc query, and analysis of large datasets. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
If you really want to get to grips with Hive, then take a look at the
full language manual.
Retrieving Information
| Function |
MySQL |
Hive |
| Retrieving Information (General) |
SELECT from_columns FROM table WHERE conditions; |
SELECT from_columns FROM table WHERE conditions; |
| Retrieving All Values |
SELECT * FROM table; |
SELECT * FROM table; |
| Retrieving Some Values |
SELECT * FROM table WHERE rec_name = "value"; |
SELECT * FROM table WHERE rec_name = "value"; |
| Retrieving With Multiple Criteria |
SELECT * FROM TABLE WHERE rec1 = "value1" AND rec2 = "value2"; |
SELECT * FROM TABLE WHERE rec1 = "value1" AND rec2 = "value2"; |
| Retrieving Specific Columns |
SELECT column_name FROM table; |
SELECT column_name FROM table; |
| Retrieving Unique Output |
SELECT DISTINCT column_name FROM table; |
SELECT DISTINCT column_name FROM table; |
| Sorting |
SELECT col1, col2 FROM table ORDER BY col2; |
SELECT col1, col2 FROM table ORDER BY col2; |
| Sorting Reverse |
SELECT col1, col2 FROM table ORDER BY col2 DESC; |
SELECT col1, col2 FROM table ORDER BY col2 DESC; |
| Counting Rows |
SELECT COUNT(*) FROM table; |
SELECT COUNT(*) FROM table; |
| Grouping With Counting |
SELECT owner, COUNT(*) FROM table GROUP BY owner; |
SELECT owner, COUNT(*) FROM table GROUP BY owner; |
| Maximum Value |
SELECT MAX(col_name) AS label FROM table; |
SELECT MAX(col_name) AS label FROM table; |
| Selecting from multiple tables (Join same table using alias w/”AS”) |
SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name; |
SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name) |
Metadata
| Function |
MySQL |
Hive |
| Selecting a database |
USE database; |
USE database; |
| Listing databases |
SHOW DATABASES; |
SHOW DATABASES; |
| Listing tables in a database |
SHOW TABLES; |
SHOW TABLES; |
| Describing the format of a table |
DESCRIBE table; |
DESCRIBE (FORMATTED|EXTENDED) table; |
| Creating a database |
CREATE DATABASE db_name; |
CREATE DATABASE db_name; |
| Dropping a database |
DROP DATABASE db_name; |
DROP DATABASE db_name (CASCADE); |
Current SQL Compatibility
Command Line
| Function |
Hive |
| Run Query |
hive -e 'select a.col from tab1 a' |
| Run Query Silent Mode |
hive -S -e 'select a.col from tab1 a' |
| Set Hive Config Variables |
hive -e 'select a.col from tab1 a' -hiveconf hive.root.logger=DEBUG,console |
| Use Initialization Script |
hive -i initialize.sql |
| Run Non-Interactive Script |
hive -f script.sql |
No comments:
Post a Comment