Using boto3, I can access my AWS S3 bucket:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
Now, the bucket contains folder first-level
, which itself contains several sub-folders named with a timestamp, for instance 1456753904534
.
I need to know the name of these sub-folders for another job I'm doing and I wonder whether I could have boto3 retrieve those for me.
So I tried:
objs = bucket.meta.client.list_objects(Bucket='my-bucket-name')
which gives a dictionary, whose key 'Contents' gives me all the third-level files instead of the second-level timestamp directories, in fact I get a list containing things as
{u'ETag': '"etag"', u'Key': first-level/1456753904534/part-00014', u'LastModified': datetime.datetime(2016, 2, 29, 13, 52, 24, tzinfo=tzutc()),
u'Owner': {u'DisplayName': 'owner', u'ID': 'id'},
u'Size': size, u'StorageClass': 'storageclass'}
you can see that the specific files, in this case part-00014
are retrieved, while I'd like to get the name of the directory alone.
In principle I could strip out the directory name from all the paths but it's ugly and expensive to retrieve everything at third level to get the second level!
I also tried something reported here:
for o in bucket.objects.filter(Delimiter='/'):
print(o.key)
but I do not get the folders at the desired level.
Is there a way to solve this?
Below piece of code returns ONLY the 'subfolders' in a 'folder' from s3 bucket.
import boto3
bucket = 'my-bucket'
#Make sure you provide / in the end
prefix = 'prefix-name-with-slash/'
client = boto3.client('s3')
result = client.list_objects(Bucket=bucket, Prefix=prefix, Delimiter='/')
for o in result.get('CommonPrefixes'):
print 'sub folder : ', o.get('Prefix')
For more details, you can refer to https://github.com/boto/boto3/issues/134