How would I loop through all the file names in a subdirectory on Google Cloud Storage with python?

sometimesiwritecode picture sometimesiwritecode · May 27, 2017 · Viewed 7.3k times · Source

Say I have some bucket/subdirectory on Google Cloud Storage and this bucket's address is:

gs://test-monkeys-example/training_data/cats

In this cats subdirectory I have a bunch of images of cats, all of which are jpgs. How would I in python loop through the cats subdirectory and print out all the names of the files in it?

Something like:

for x in directory('gs://test-monkeys-example/training_data/cats'):
    print(x)

Obviously directory('gs://test-monkeys-example/training_data/cats') is not how to do this and is just psuedocode- how would i do this?!

Answer

Brandon Yarbrough picture Brandon Yarbrough · May 29, 2017

Google Cloud Storage supports listing only objects that begin with a certain prefix. You can access it from the client library like so:

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('mybucket')
for blob in bucket.list_blobs(prefix='training_data/cats'):
  print blob.name