Best way to Export Hive table to CSV file

This post is to explain different options available to export Hive Table (ORC, Parquet, or Text) to CSV File.

Expected output: CSV File with comma delimiter and header

Method 1 :

hive -e 'select * from table_orc_data;' | sed 's/[[:space:]]\+/,/g' > ~/output.csv

Pros: Simple to use, output column header but default output is tab.
Cons: Need to convert Tab delimiter to ‘,’ which could be time-consuming when exporting the large files.

Method 2:

$ hadoop fs -cat hdfs://servername/user/hive/warehouse/databasename/table_csv_export_data/* > ~/output.csvCopy

Pros: Simple, with comma as a delimiter in CSV output.
Cons: No column headers.

Method 3: (My personal favorite)

— Step 3a: Create a CSV table with a dummy header column as the first row.

CREATE TABLE table_csv_export_data
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED as textfile
AS
select
'id' as id
,'first_name' as first_name
,'last_name' as last_name
,'join_date' as join_date;

— Step 3b: Now insert data actual data into the table

INSERT INTO table_csv_export_data
SELECT
 id
 ,first_name
 ,last_name
 ,join_date
FROM
 table_orc_data;

— Step 3c

hadoop fs -cat hdfs://servername/user/hive/warehouse/databasename/table_csv_export_data/* > ~/output.csv

Pros: CSV with header columns
Cons: Extra line of script to add header info as row, but the final output is quick and as desired.

Mohammed Jawad

Search This Blog

Best way to Export Hive table to CSV file

This post is to explain different options available to export Hive Table (ORC, Parquet, or Text) to CSV File.

Comments

Post a Comment