Best way to check function arguments?

carmellose picture carmellose · Oct 30, 2013 · Viewed 131.8k times · Source

I'm looking for an efficient way to check variables of a Python function. For example, I'd like to check arguments type and value. Is there a module for this? Or should I use something like decorators, or any specific idiom?

def my_function(a, b, c):
    """An example function I'd like to check the arguments of."""
    # check that a is an int
    # check that 0 < b < 10
    # check that c is not an empty string

Answer

Cecil Curry picture Cecil Curry · Jun 22, 2016

In this elongated answer, we implement a Python 3.x-specific type checking decorator based on PEP 484-style type hints in less than 275 lines of pure-Python (most of which is explanatory docstrings and comments) – heavily optimized for industrial-strength real-world use complete with a py.test-driven test suite exercising all possible edge cases.

Feast on the unexpected awesome of bear typing:

>>> @beartype
... def spirit_bear(kermode: str, gitgaata: (str, int)) -> tuple:
...     return (kermode, gitgaata, "Moksgm'ol", 'Ursus americanus kermodei')
>>> spirit_bear(0xdeadbeef, 'People of the Cane')
AssertionError: parameter kermode=0xdeadbeef not of <class "str">

As this example suggests, bear typing explicitly supports type checking of parameters and return values annotated as either simple types or tuples of such types. Golly!

O.K., that's actually unimpressive. @beartype resembles every other Python 3.x-specific type checking decorator based on PEP 484-style type hints in less than 275 lines of pure-Python. So what's the rub, bub?

Pure Bruteforce Hardcore Efficiency

Bear typing is dramatically more efficient in both space and time than all existing implementations of type checking in Python to the best of my limited domain knowledge. (More on that later.)

Efficiency usually doesn't matter in Python, however. If it did, you wouldn't be using Python. Does type checking actually deviate from the well-established norm of avoiding premature optimization in Python? Yes. Yes, it does.

Consider profiling, which adds unavoidable overhead to each profiled metric of interest (e.g., function calls, lines). To ensure accurate results, this overhead is mitigated by leveraging optimized C extensions (e.g., the _lsprof C extension leveraged by the cProfile module) rather than unoptimized pure-Python (e.g., the profile module). Efficiency really does matter when profiling.

Type checking is no different. Type checking adds overhead to each function call type checked by your application – ideally, all of them. To prevent well-meaning (but sadly small-minded) coworkers from removing the type checking you silently added after last Friday's caffeine-addled allnighter to your geriatric legacy Django web app, type checking must be fast. So fast that no one notices it's there when you add it without telling anyone. I do this all the time! Stop reading this if you are a coworker.

If even ludicrous speed isn't enough for your gluttonous application, however, bear typing may be globally disabled by enabling Python optimizations (e.g., by passing the -O option to the Python interpreter):

$ python3 -O
# This succeeds only when type checking is optimized away. See above!
>>> spirit_bear(0xdeadbeef, 'People of the Cane')
(0xdeadbeef, 'People of the Cane', "Moksgm'ol", 'Ursus americanus kermodei')

Just because. Welcome to bear typing.

What The...? Why "bear"? You're a Neckbeard, Right?

Bear typing is bare-metal type checking – that is, type checking as close to the manual approach of type checking in Python as feasible. Bear typing is intended to impose no performance penalties, compatibility constraints, or third-party dependencies (over and above that imposed by the manual approach, anyway). Bear typing may be seamlessly integrated into existing codebases and test suites without modification.

Everyone's probably familiar with the manual approach. You manually assert each parameter passed to and/or return value returned from every function in your codebase. What boilerplate could be simpler or more banal? We've all seen it a hundred times a googleplex times, and vomited a little in our mouths everytime we did. Repetition gets old fast. DRY, yo.

Get your vomit bags ready. For brevity, let's assume a simplified easy_spirit_bear() function accepting only a single str parameter. Here's what the manual approach looks like:

def easy_spirit_bear(kermode: str) -> str:
    assert isinstance(kermode, str), 'easy_spirit_bear() parameter kermode={} not of <class "str">'.format(kermode)
    return_value = (kermode, "Moksgm'ol", 'Ursus americanus kermodei')
    assert isinstance(return_value, str), 'easy_spirit_bear() return value {} not of <class "str">'.format(return_value)
    return return_value

Python 101, right? Many of us passed that class.

Bear typing extracts the type checking manually performed by the above approach into a dynamically defined wrapper function automatically performing the same checks – with the added benefit of raising granular TypeError rather than ambiguous AssertionError exceptions. Here's what the automated approach looks like:

def easy_spirit_bear_wrapper(*args, __beartype_func=easy_spirit_bear, **kwargs):
    if not (
        isinstance(args[0], __beartype_func.__annotations__['kermode'])
        if 0 < len(args) else
        isinstance(kwargs['kermode'], __beartype_func.__annotations__['kermode'])
        if 'kermode' in kwargs else True):
            raise TypeError(
                'easy_spirit_bear() parameter kermode={} not of {!r}'.format(
                args[0] if 0 < len(args) else kwargs['kermode'],
                __beartype_func.__annotations__['kermode']))

    return_value = __beartype_func(*args, **kwargs)

    if not isinstance(return_value, __beartype_func.__annotations__['return']):
        raise TypeError(
            'easy_spirit_bear() return value {} not of {!r}'.format(
                return_value, __beartype_func.__annotations__['return']))

    return return_value

It's long-winded. But it's also basically* as fast as the manual approach. * Squinting suggested.

Note the complete lack of function inspection or iteration in the wrapper function, which contains a similar number of tests as the original function – albeit with the additional (maybe negligible) costs of testing whether and how the parameters to be type checked are passed to the current function call. You can't win every battle.

Can such wrapper functions actually be reliably generated to type check arbitrary functions in less than 275 lines of pure Python? Snake Plisskin says, "True story. Got a smoke?"

And, yes. I may have a neckbeard.

No, Srsly. Why "bear"?

Bear beats duck. Duck may fly, but bear may throw salmon at duck. In Canada, nature can surprise you.

Next question.

What's So Hot about Bears, Anyway?

Existing solutions do not perform bare-metal type checking – at least, none I've grepped across. They all iteratively reinspect the signature of the type-checked function on each function call. While negligible for a single call, reinspection overhead is usually non-negligible when aggregated over all calls. Really, really non-negligible.

It's not simply efficiency concerns, however. Existing solutions also often fail to account for common edge cases. This includes most if not all toy decorators provided as stackoverflow answers here and elsewhere. Classic failures include:

  • Failing to type check keyword arguments and/or return values (e.g., sweeneyrod's @checkargs decorator).
  • Failing to support tuples (i.e., unions) of types accepted by the isinstance() builtin.
  • Failing to propagate the name, docstring, and other identifying metadata from the original function onto the wrapper function.
  • Failing to supply at least a semblance of unit tests. (Kind of critical.)
  • Raising generic AssertionError exceptions rather than specific TypeError exceptions on failed type checks. For granularity and sanity, type checking should never raise generic exceptions.

Bear typing succeeds where non-bears fail. All one, all bear!

Bear Typing Unbared

Bear typing shifts the space and time costs of inspecting function signatures from function call time to function definition time – that is, from the wrapper function returned by the @beartype decorator into the decorator itself. Since the decorator is only called once per function definition, this optimization yields glee for all.

Bear typing is an attempt to have your type checking cake and eat it, too. To do so, @beartype:

  1. Inspects the signature and annotations of the original function.
  2. Dynamically constructs the body of the wrapper function type checking the original function. Thaaat's right. Python code generating Python code.
  3. Dynamically declares this wrapper function via the exec() builtin.
  4. Returns this wrapper function.

Shall we? Let's dive into the deep end.

# If the active Python interpreter is *NOT* optimized (e.g., option "-O" was
# *NOT* passed to this interpreter), enable type checking.
if __debug__:
    import inspect
    from functools import wraps
    from inspect import Parameter, Signature

    def beartype(func: callable) -> callable:
        '''
        Decorate the passed **callable** (e.g., function, method) to validate
        both all annotated parameters passed to this callable _and_ the
        annotated value returned by this callable if any.

        This decorator performs rudimentary type checking based on Python 3.x
        function annotations, as officially documented by PEP 484 ("Type
        Hints"). While PEP 484 supports arbitrarily complex type composition,
        this decorator requires _all_ parameter and return value annotations to
        be either:

        * Classes (e.g., `int`, `OrderedDict`).
        * Tuples of classes (e.g., `(int, OrderedDict)`).

        If optimizations are enabled by the active Python interpreter (e.g., due
        to option `-O` passed to this interpreter), this decorator is a noop.

        Raises
        ----------
        NameError
            If any parameter has the reserved name `__beartype_func`.
        TypeError
            If either:
            * Any parameter or return value annotation is neither:
              * A type.
              * A tuple of types.
            * The kind of any parameter is unrecognized. This should _never_
              happen, assuming no significant changes to Python semantics.
        '''

        # Raw string of Python statements comprising the body of this wrapper,
        # including (in order):
        #
        # * A "@wraps" decorator propagating the name, docstring, and other
        #   identifying metadata of the original function to this wrapper.
        # * A private "__beartype_func" parameter initialized to this function.
        #   In theory, the "func" parameter passed to this decorator should be
        #   accessible as a closure-style local in this wrapper. For unknown
        #   reasons (presumably, a subtle bug in the exec() builtin), this is
        #   not the case. Instead, a closure-style local must be simulated by
        #   passing the "func" parameter to this function at function
        #   definition time as the default value of an arbitrary parameter. To
        #   ensure this default is *NOT* overwritten by a function accepting a
        #   parameter of the same name, this edge case is tested for below.
        # * Assert statements type checking parameters passed to this callable.
        # * A call to this callable.
        # * An assert statement type checking the value returned by this
        #   callable.
        #
        # While there exist numerous alternatives (e.g., appending to a list or
        # bytearray before joining the elements of that iterable into a string),
        # these alternatives are either slower (as in the case of a list, due to
        # the high up-front cost of list construction) or substantially more
        # cumbersome (as in the case of a bytearray). Since string concatenation
        # is heavily optimized by the official CPython interpreter, the simplest
        # approach is (curiously) the most ideal.
        func_body = '''
@wraps(__beartype_func)
def func_beartyped(*args, __beartype_func=__beartype_func, **kwargs):
'''

        # "inspect.Signature" instance encapsulating this callable's signature.
        func_sig = inspect.signature(func)

        # Human-readable name of this function for use in exceptions.
        func_name = func.__name__ + '()'

        # For the name of each parameter passed to this callable and the
        # "inspect.Parameter" instance encapsulating this parameter (in the
        # passed order)...
        for func_arg_index, func_arg in enumerate(func_sig.parameters.values()):
            # If this callable redefines a parameter initialized to a default
            # value by this wrapper, raise an exception. Permitting this
            # unlikely edge case would permit unsuspecting users to
            # "accidentally" override these defaults.
            if func_arg.name == '__beartype_func':
                raise NameError(
                    'Parameter {} reserved for use by @beartype.'.format(
                        func_arg.name))

            # If this parameter is both annotated and non-ignorable for purposes
            # of type checking, type check this parameter.
            if (func_arg.annotation is not Parameter.empty and
                func_arg.kind not in _PARAMETER_KIND_IGNORED):
                # Validate this annotation.
                _check_type_annotation(
                    annotation=func_arg.annotation,
                    label='{} parameter {} type'.format(
                        func_name, func_arg.name))

                # String evaluating to this parameter's annotated type.
                func_arg_type_expr = (
                    '__beartype_func.__annotations__[{!r}]'.format(
                        func_arg.name))

                # String evaluating to this parameter's current value when
                # passed as a keyword.
                func_arg_value_key_expr = 'kwargs[{!r}]'.format(func_arg.name)

                # If this parameter is keyword-only, type check this parameter
                # only by lookup in the variadic "**kwargs" dictionary.
                if func_arg.kind is Parameter.KEYWORD_ONLY:
                    func_body += '''
    if {arg_name!r} in kwargs and not isinstance(
        {arg_value_key_expr}, {arg_type_expr}):
        raise TypeError(
            '{func_name} keyword-only parameter '
            '{arg_name}={{}} not a {{!r}}'.format(
                {arg_value_key_expr}, {arg_type_expr}))
'''.format(
                        func_name=func_name,
                        arg_name=func_arg.name,
                        arg_type_expr=func_arg_type_expr,
                        arg_value_key_expr=func_arg_value_key_expr,
                    )
                # Else, this parameter may be passed either positionally or as
                # a keyword. Type check this parameter both by lookup in the
                # variadic "**kwargs" dictionary *AND* by index into the
                # variadic "*args" tuple.
                else:
                    # String evaluating to this parameter's current value when
                    # passed positionally.
                    func_arg_value_pos_expr = 'args[{!r}]'.format(
                        func_arg_index)

                    func_body += '''
    if not (
        isinstance({arg_value_pos_expr}, {arg_type_expr})
        if {arg_index} < len(args) else
        isinstance({arg_value_key_expr}, {arg_type_expr})
        if {arg_name!r} in kwargs else True):
            raise TypeError(
                '{func_name} parameter {arg_name}={{}} not of {{!r}}'.format(
                {arg_value_pos_expr} if {arg_index} < len(args) else {arg_value_key_expr},
                {arg_type_expr}))
'''.format(
                    func_name=func_name,
                    arg_name=func_arg.name,
                    arg_index=func_arg_index,
                    arg_type_expr=func_arg_type_expr,
                    arg_value_key_expr=func_arg_value_key_expr,
                    arg_value_pos_expr=func_arg_value_pos_expr,
                )

        # If this callable's return value is both annotated and non-ignorable
        # for purposes of type checking, type check this value.
        if func_sig.return_annotation not in _RETURN_ANNOTATION_IGNORED:
            # Validate this annotation.
            _check_type_annotation(
                annotation=func_sig.return_annotation,
                label='{} return type'.format(func_name))

            # Strings evaluating to this parameter's annotated type and
            # currently passed value, as above.
            func_return_type_expr = (
                "__beartype_func.__annotations__['return']")

            # Call this callable, type check the returned value, and return this
            # value from this wrapper.
            func_body += '''
    return_value = __beartype_func(*args, **kwargs)
    if not isinstance(return_value, {return_type}):
        raise TypeError(
            '{func_name} return value {{}} not of {{!r}}'.format(
                return_value, {return_type}))
    return return_value
'''.format(func_name=func_name, return_type=func_return_type_expr)
        # Else, call this callable and return this value from this wrapper.
        else:
            func_body += '''
    return __beartype_func(*args, **kwargs)
'''

        # Dictionary mapping from local attribute name to value. For efficiency,
        # only those local attributes explicitly required in the body of this
        # wrapper are copied from the current namespace. (See below.)
        local_attrs = {'__beartype_func': func}

        # Dynamically define this wrapper as a closure of this decorator. For
        # obscure and presumably uninteresting reasons, Python fails to locally
        # declare this closure when the locals() dictionary is passed; to
        # capture this closure, a local dictionary must be passed instead.
        exec(func_body, globals(), local_attrs)

        # Return this wrapper.
        return local_attrs['func_beartyped']

    _PARAMETER_KIND_IGNORED = {
        Parameter.POSITIONAL_ONLY, Parameter.VAR_POSITIONAL, Parameter.VAR_KEYWORD,
    }
    '''
    Set of all `inspect.Parameter.kind` constants to be ignored during
    annotation- based type checking in the `@beartype` decorator.

    This includes:

    * Constants specific to variadic parameters (e.g., `*args`, `**kwargs`).
      Variadic parameters cannot be annotated and hence cannot be type checked.
    * Constants specific to positional-only parameters, which apply to non-pure-
      Python callables (e.g., defined by C extensions). The `@beartype`
      decorator applies _only_ to pure-Python callables, which provide no
      syntactic means of specifying positional-only parameters.
    '''

    _RETURN_ANNOTATION_IGNORED = {Signature.empty, None}
    '''
    Set of all annotations for return values to be ignored during annotation-
    based type checking in the `@beartype` decorator.

    This includes:

    * `Signature.empty`, signifying a callable whose return value is _not_
      annotated.
    * `None`, signifying a callable returning no value. By convention, callables
      returning no value are typically annotated to return `None`. Technically,
      callables whose return values are annotated as `None` _could_ be
      explicitly checked to return `None` rather than a none-`None` value. Since
      return values are safely ignorable by callers, however, there appears to
      be little real-world utility in enforcing this constraint.
    '''

    def _check_type_annotation(annotation: object, label: str) -> None:
        '''
        Validate the passed annotation to be a valid type supported by the
        `@beartype` decorator.

        Parameters
        ----------
        annotation : object
            Annotation to be validated.
        label : str
            Human-readable label describing this annotation, interpolated into
            exceptions raised by this function.

        Raises
        ----------
        TypeError
            If this annotation is neither a new-style class nor a tuple of
            new-style classes.
        '''

        # If this annotation is a tuple, raise an exception if any member of
        # this tuple is not a new-style class. Note that the "__name__"
        # attribute tested below is not defined by old-style classes and hence
        # serves as a helpful means of identifying new-style classes.
        if isinstance(annotation, tuple):
            for member in annotation:
                if not (
                    isinstance(member, type) and hasattr(member, '__name__')):
                    raise TypeError(
                        '{} tuple member {} not a new-style class'.format(
                            label, member))
        # Else if this annotation is not a new-style class, raise an exception.
        elif not (
            isinstance(annotation, type) and hasattr(annotation, '__name__')):
            raise TypeError(
                '{} {} neither a new-style class nor '
                'tuple of such classes'.format(label, annotation))

# Else, the active Python interpreter is optimized. In this case, disable type
# checking by reducing this decorator to the identity decorator.
else:
    def beartype(func: callable) -> callable:
        return func

And leycec said, Let the @beartype bring forth type checking fastly: and it was so.

Caveats, Curses, and Empty Promises

Nothing is perfect. Even bear typing.

Caveat I: Default Values Unchecked

Bear typing does not type check unpassed parameters assigned default values. In theory, it could. But not in 275 lines or less and certainly not as a stackoverflow answer.

The safe (...probably totally unsafe) assumption is that function implementers claim they knew what they were doing when they defined default values. Since default values are typically constants (...they'd better be!), rechecking the types of constants that never change on each function call assigned one or more default values would contravene the fundamental tenet of bear typing: "Don't repeat yourself over and oooover and oooo-oooover again."

Show me wrong and I will shower you with upvotes.

Caveat II: No PEP 484

PEP 484 ("Type Hints") formalized the use of function annotations first introduced by PEP 3107 ("Function Annotations"). Python 3.5 superficially supports this formalization with a new top-level typing module, a standard API for composing arbitrarily complex types from simpler types (e.g., Callable[[Arg1Type, Arg2Type], ReturnType], a type describing a function accepting two arguments of type Arg1Type and Arg2Type and returning a value of type ReturnType).

Bear typing supports none of them. In theory, it could. But not in 275 lines or less and certainly not as a stackoverflow answer.

Bear typing does, however, support unions of types in the same way that the isinstance() builtin supports unions of types: as tuples. This superficially corresponds to the typing.Union type – with the obvious caveat that typing.Union supports arbitrarily complex types, while tuples accepted by @beartype support only simple classes. In my defense, 275 lines.

Tests or It Didn't Happen

Here's the gist of it. Get it, gist? I'll stop now.

As with the @beartype decorator itself, these py.test tests may be seamlessly integrated into existing test suites without modification. Precious, isn't it?

Now the mandatory neckbeard rant nobody asked for.

A History of API Violence

Python 3.5 provides no actual support for using PEP 484 types. wat?

It's true: no type checking, no type inference, no type nuthin'. Instead, developers are expected to routinely run their entire codebases through heavyweight third-party CPython interpreter wrappers implementing a facsimile of such support (e.g., mypy). Of course, these wrappers impose:

  • A compatibility penalty. As the official mypy FAQ admits in response to the frequently asked question "Can I use mypy to type check my existing Python code?": "It depends. Compatibility is pretty good, but some Python features are not yet implemented or fully supported." A subsequent FAQ response clarifies this incompatibility by stating that:
    • "...your code must make attributes explicit and use a explicit protocol representation." Grammar police see your "a explicit" and raise you an implicit frown.
    • "Mypy will support modular, efficient type checking, and this seems to rule out type checking some language features, such as arbitrary runtime addition of methods. However, it is likely that many of these features will be supported in a restricted form (for example, runtime modification is only supported for classes or methods registered as dynamic or ‘patchable’)."
    • For a full list of syntactic incompatibilities, see "Dealing with common issues". It's not pretty. You just wanted type checking and now you refactored your entire codebase and broke everyone's build two days from the candidate release and the comely HR midget in casual business attire slips a pink slip through the crack in your cubicle-cum-mancave. Thanks alot, mypy.
  • A performance penalty, despite interpreting statically typed code. Fourty years of hard-boiled computer science tells us that (...all else being equal) interpreting statically typed code should be faster, not slower, than interpreting dynamically typed code. In Python, up is the new down.
  • Additional non-trivial dependencies, increasing:
    • The bug-laden fragility of project deployment, especially cross-platform.
    • The maintenance burden of project development.
    • Possible attack surface.

I ask Guido: "Why? Why bother inventing an abstract API if you weren't willing to pony up a concrete API actually doing something with that abstraction?" Why leave the fate of a million Pythonistas to the arthritic hand of the free open-source marketplace? Why create yet another techno-problem that could have been trivially solved with a 275-line decorator in the official Python stdlib?

I have no Python and I must scream.