How can I implement an optional parameter to an AWS Glue Job?
I have created a job that currently have a string parameter (an ISO 8601 date string) as an input that is used in the ETL job. I would like to make this parameter optional, so that the job use a default value if it is not provided (e.g. using datetime.now and datetime.isoformatin my case). I have tried using getResolvedOptions:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
However, when I am not passing an --ISO_8601_STRING
job parameter I see the following error:
awsglue.utils.GlueArgumentError: argument --ISO_8601_STRING is required
matsev and Yuriy solutions is fine if you have only one field which is optional.
I wrote a wrapper function for python that is more generic and handle different corner cases (mandatory fields and/or optional fields with values).
import sys
from awsglue.utils import getResolvedOptions
def get_glue_args(mandatory_fields, default_optional_args):
"""
This is a wrapper of the glue function getResolvedOptions to take care of the following case :
* Handling optional arguments and/or mandatory arguments
* Optional arguments with default value
NOTE:
* DO NOT USE '-' while defining args as the getResolvedOptions with replace them with '_'
* All fields would be return as a string type with getResolvedOptions
Arguments:
mandatory_fields {list} -- list of mandatory fields for the job
default_optional_args {dict} -- dict for optional fields with their default value
Returns:
dict -- given args with default value of optional args not filled
"""
# The glue args are available in sys.argv with an extra '--'
given_optional_fields_key = list(set([i[2:] for i in sys.argv]).intersection([i for i in default_optional_args]))
args = getResolvedOptions(sys.argv,
mandatory_fields+given_optional_fields_key)
# Overwrite default value if optional args are provided
default_optional_args.update(args)
return default_optional_args
Usage :
# Defining mandatory/optional args
mandatory_fields = ['my_mandatory_field_1','my_mandatory_field_2']
default_optional_args = {'optional_field_1':'myvalue1', 'optional_field_2':'myvalue2'}
# Retrieve args
args = get_glue_args(mandatory_fields, default_optional_args)
# Access element as dict with args[‘key’]