Hadoop/Hive : Loading data from .csv on a local machine

mel picture mel · Oct 11, 2013 · Viewed 131.2k times · Source

As this is coming from a newbie...

I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server?

How do I load .csv data into Hive then? What does it have to do with Hadoop and which mode I should run that one?

What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier?

Answer

Adewole Kayode picture Adewole Kayode · Sep 26, 2015

Let me work you through the following simple steps:

Steps:

First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.

hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';

Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.

hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;

Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded

hive> SELECT * FROM Staff;

Thanks.