Best way to Export Hive table to CSV file

This post is to explain different options available to export Hive Table (ORC, Parquet, or Text) to CSV File.


Expected output: CSV File with comma delimiter and header


Method 1 :

hive -e 'select * from table_orc_data;' | sed 's/[[:space:]]\+/,/g' > ~/output.csv

Pros: Simple to use, output column header but default output is tab.
Cons: Need to convert Tab delimiter to ‘,’ which could be time-consuming when exporting the large files.


Method 2:


$ hadoop fs -cat hdfs://servername/user/hive/warehouse/databasename/table_csv_export_data/* > ~/output.csvCopy

Pros: Simple, with comma as a delimiter in CSV output.
Cons: No column headers.


Method 3: (My personal favorite)

— Step 3a: Create a CSV table with a dummy header column as the first row.

CREATE TABLE table_csv_export_data
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED as textfile
AS
select
'id' as id
,'first_name' as first_name
,'last_name' as last_name
,'join_date' as join_date;




— Step 3b: Now insert data actual data into the table
— Step 3c
Pros: CSV with header columns
Cons: Extra line of script to add header info as row, but the final output is quick and as desired.

Comments