This post is to explain different options available to export Hive Table (ORC, Parquet, or Text) to CSV File.
Expected output: CSV File with comma delimiter and header
Method 1 :
hive -e 'select * from table_orc_data;' | sed 's/[[:space:]]\+/,/g' > ~/output.csv
Pros: Simple to use, output column header but default output is tab.
Cons: Need to convert Tab delimiter to ‘,’ which could be time-consuming when exporting the large files.
Method 2:
Pros: Simple, with comma as a delimiter in CSV output.
Cons: No column headers.
Method 3: (My personal favorite)
— Step 3a: Create a CSV table with a dummy header column as the first row.
CREATE TABLE table_csv_export_data
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED as textfile
AS
select
'id' as id
,'first_name' as first_name
,'last_name' as last_name
,'join_date' as join_date;
Cons: Need to convert Tab delimiter to ‘,’ which could be time-consuming when exporting the large files.
Method 2:
$ hadoop fs -cat hdfs://servername/user/hive/warehouse/databasename/table_csv_export_data/* > ~/output.csvCopy
Pros: Simple, with comma as a delimiter in CSV output.
Cons: No column headers.
Method 3: (My personal favorite)
— Step 3a: Create a CSV table with a dummy header column as the first row.
CREATE TABLE table_csv_export_data
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED as textfile
AS
select
'id' as id
,'first_name' as first_name
,'last_name' as last_name
,'join_date' as join_date;
— Step 3b: Now insert data actual data into the table
INSERT INTO table_csv_export_data
SELECT
id
,first_name
,last_name
,join_date
FROM
table_orc_data;
— Step 3c
hadoop fs -cat hdfs://servername/user/hive/warehouse/databasename/table_csv_export_data/* > ~/output.csv
Pros: CSV with header columns
Cons: Extra line of script to add header info as row, but the final output is quick and as desired.
Cons: Extra line of script to add header info as row, but the final output is quick and as desired.

Comments
Post a Comment