Writing data to Hadoop

Steve Severance picture Steve Severance · Oct 7, 2009 · Viewed 52.8k times · Source

I need to write data in to Hadoop (HDFS) from external sources like a windows box. Right now I have been copying the data onto the namenode and using HDFS's put command to ingest it into the cluster. In my browsing of the code I didn't see an API for doing this. I am hoping someone can show me that I am wrong and there is an easy way to code external clients against HDFS.

Answer

Peter Wippermann picture Peter Wippermann · Oct 27, 2009

There is an API in Java. You can use it by including the Hadoop code in your project. The JavaDoc is quite helpful in general, but of course you have to know, what you are looking for *g * http://hadoop.apache.org/common/docs/

For your particular problem, have a look at: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html (this applies to the latest release, consult other JavaDocs for different versions!)

A typical call would be: Filesystem.get(new JobConf()).create(new Path("however.file")); Which returns you a stream you can handle with regular JavaIO.