How to use Parameters in Python Luigi

Joel Katz picture Joel Katz · Jan 26, 2017 · Viewed 8.7k times · Source

How do I pass in parameters to Luigi? if I have a python file called FileFinder.py with a class named getFIles:

class getFiles(luigi.Task):

and I want to pass in a directory to this class such as:

C://Documents//fileName

and then use this parameter in my run method

def run(self):

how do I run this in command line and add the parameter for use in my code? I am accustomed to running this file in command line like this:

python FileFinder.py getFiles --local-scheduler

What do I add to my code to use a parameter, and how do I add that parameter to the command line argument?

Also, as an extension of this question, how would I use multiple arguments? or arguments of different data types such as strings or lists?

Answer

Toterich picture Toterich · Jan 26, 2017

As you have already figured out, you can pass arguments to luigi via

--param-name param-value

in the command line. Inside your code, you have to declare these variables by instantiating the Parameter class or one of it's subclasses. The subclasses are used to tell luigi if the variable has a data-type that is not string. Here is an example which uses two command line arguments, one Int and one List:

import luigi

class testClass(luigi.Task):
  int_var = luigi.IntParameter()
  list_var = luigi.ListParameter()

  def run(self):
      print('Integer Param + 1 = %i' % (self.int_var + 1))

      list_var = list(self.list_var)
      list_var.append('new_elem')
      print('List Param with added element: ' + str(list_var))

Note that ListParams actually get converted to tuples by luigi, so if you want to do list operations on them, you have to convert them back first (This is a known issue, but doesn't look like it will be fixed soon).

You can invoke the above module from the command line like this (i have saved the code as a file called "testmodule.py" and made the call from inside the same directory):

luigi --module testmodule testClass --int-var 3 --list-var '[1,2,3]'  --local-scheduler

Note here that for variables containing a _, this has to be replaced by -. The call yields (along with many status messages):

Integer Param + 1 = 4
List Param with added element: [1, 2, 3, 'new_elem']