How to get list of folders in a given bucket using Google Cloud API

Shamshad Alam picture Shamshad Alam · May 6, 2016 · Viewed 23.4k times · Source

I wanted to get all the folders inside a given Google Cloud bucket or folder using Google Cloud Storage API.

For example if gs://abc/xyz contains three folders gs://abc/xyz/x1, gs://abc/xyz/x2 and gs://abc/xyz/x3. The API should return all three folder in gs://abc/xyz.

It can easily be done using gsutil

gsutil ls gs://abc/xyz

But I need to do it using python and Google Cloud Storage API.

Answer

Antony Harfield picture Antony Harfield · Nov 23, 2019

This question is about listing the folders inside a bucket/folder. None of the suggestions worked for me and after experimenting with the google.cloud.storage SDK, I suspect it is not possible (as of November 2019) to list the sub-directories of any path in a bucket. It is possible with the REST API, so I wrote this little wrapper...

from google.api_core import page_iterator
from google.cloud import storage

def _item_to_value(iterator, item):
    return item

def list_directories(bucket_name, prefix):
    if not prefix.endswith('/'):
        prefix += '/'

    extra_params = {
        "projection": "noAcl",
        "prefix": prefix,
        "delimiter": '/'
    }

    gcs = storage.Client()

    path = "/b/" + bucket_name + "/o"

    iterator = page_iterator.HTTPIterator(
        client=gcs,
        api_request=gcs._connection.api_request,
        path=path,
        items_key='prefixes',
        item_to_value=_item_to_value,
        extra_params=extra_params,
    )

    return [x for x in iterator]

For example, if you have my-bucket containing:

  • dog-bark
    • datasets
      • v1
      • v2

Then calling list_directories('my-bucket', 'dog-bark/datasets') will return:

['dog-bark/datasets/v1', 'dog-bark/datasets/v2']