As this is coming from a newbie...
I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server?
How do I load .csv data into Hive then? What does it have to do with Hadoop and which mode I should run that one?
What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier?
Let me work you through the following simple steps:
Steps:
First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.
hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';
Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.
hive> LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;
Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded
hive> SELECT * FROM Staff;
Thanks.