How to check if any given object exist in google cloud storage bucket through bash

Virat picture Virat · Feb 8, 2018 · Viewed 7.1k times · Source

I would like to pragmatically check if object exist at a perticular google cloud storage bucket. Based on object availability i would perform further operations.

I have gone through https://cloud.google.com/storage/docs/gsutil/commands/stat and doc mentioned that "gsutil -q" useful for writing scripts, because the exit status will be 0 for an existing object and 1 for a non-existent object. But when i use command it does not work properly. Please let me know if anyone tried this before?

#!/bin/bash
gsutil -q stat gs://<bucketname>/object

return_value=$?

if [ $return_value != 0 ]; then
    echo "folder exist"
else
    echo "folder does not exist"
fi

Answer

dsesto picture dsesto · Feb 13, 2018

I see that you already have found the answer to your issue, however, I will post this answer here in order to give more context on how the gsutil stat command works and why was your code not working.

gsutil is a Python application that is used for accessing and working with Cloud Storage using the Command Line Interface. It has many commands available, and the one that you used is gsutil stat, which outputs the metadata of an object retrieving the minimum possible data without having to list all the objects in a bucket. This command is also strongly consistent, which makes it an appropriate solution for certain types of applications.

Using the gsutil stat gs://<BUCKET_NAME>/<BUCKET_OBJECT> command, returns something like the following:

gs://<BUCKET_NAME>/<BUCKET_OBJECT>.png:
    Creation time:          Tue, 06 Feb 2018 14:49:58 GMT
    Update time:            Tue, 06 Feb 2018 14:49:58 GMT
    Storage class:          MULTI_REGIONAL
    Content-Length:         6119
    Content-Type:           image/png
    Hash (crc32c):          <CRC32C_HASH>
    Hash (md5):             <MD5_HASH>
    ETag:                   <ETAG>
    Generation:             <TIMESTAMP>
    Metageneration:         1

However, if you run it using the -q, it will return 0 if the object exists, or 1 if does not, which makes it interesting for writing scripts such as the one you shared.

Finally, there are some additional considerations that you have to consider when working with subdirectories inside a bucket:

  • A command such as gsutil -q stat gs://my_bucket/my_subdirectory will retrieve the data of an object called my_subdirectory, not of a directory itself.
  • A command such as gsutil -q stat gs://my_bucket/my_subdirectory/ will operate over the subdirectory itself, and not over the nested files, so it will just tell you whether the subdirectory exists or not (this is why your code was failing).
  • You have to use something like gsutil -q stat gs://my_bucket/my_subdirectory/my_nested_file.txt in order to retrieve the metadata of a file nested under a subdirectory.

So, in short, your issue was that you were not making a proper definition of paths. It is not that gsutil is too sensitive in understanding path, but this behavior is working as intended, because you may have the following situation, where you have a file and a folder with the same name, and you should be able to retrieve either of them, thus requiring to specify the / that indicates whether it is a directory or a file:

gs://my_bucket/
  |_ my_subdirectory        #This is a file
  |_ my_subdirectory/       #This is a folder
     |_ my_nested_file.txt  #This is a nested file