Wildcard in Hadoop's FileSystem listing API calls

snooze92 picture snooze92 · Jul 9, 2014 · Viewed 15.2k times · Source

tl;dr: To be able to use wildcards (globs) in the listed paths, one simply has to use globStatus(...) instead of listStatus(...).

Answer

Mukesh S picture Mukesh S · Jul 9, 2014

Instead of listStatus you can try hadoops globStatus. Hadoop provides two FileSystem method for processing globs:

public FileStatus[] globStatus(Path pathPattern) throws IOException
public FileStatus[] globStatus(Path pathPattern, PathFilter filter) throws IOException

An optional PathFilter can be specified to restrict the matches further.

For more description you can check Hadoop:Definitive Guide here

Hope it helps..!!!