I wanted to get all the folders inside a given Google Cloud bucket or folder using Google Cloud Storage API.
For example if gs://abc/xyz
contains three folders gs://abc/xyz/x1
, gs://abc/xyz/x2
and gs://abc/xyz/x3
. The API should return all three folder in gs://abc/xyz
.
It can easily be done using gsutil
gsutil ls gs://abc/xyz
But I need to do it using python and Google Cloud Storage API.
This question is about listing the folders inside a bucket/folder. None of the suggestions worked for me and after experimenting with the google.cloud.storage
SDK, I suspect it is not possible (as of November 2019) to list the sub-directories of any path in a bucket. It is possible with the REST API, so I wrote this little wrapper...
from google.api_core import page_iterator
from google.cloud import storage
def _item_to_value(iterator, item):
return item
def list_directories(bucket_name, prefix):
if not prefix.endswith('/'):
prefix += '/'
extra_params = {
"projection": "noAcl",
"prefix": prefix,
"delimiter": '/'
}
gcs = storage.Client()
path = "/b/" + bucket_name + "/o"
iterator = page_iterator.HTTPIterator(
client=gcs,
api_request=gcs._connection.api_request,
path=path,
items_key='prefixes',
item_to_value=_item_to_value,
extra_params=extra_params,
)
return [x for x in iterator]
For example, if you have my-bucket
containing:
Then calling list_directories('my-bucket', 'dog-bark/datasets')
will return:
['dog-bark/datasets/v1', 'dog-bark/datasets/v2']