I have a Jupyter notebook that I plan to run repeatedly. It has functions in it, the structure of the code is this:
def construct_url(data):
...
return url
def scrape_url(url):
... # fetch url, extract data
return parsed_data
for i in mylist:
url = construct_url(i)
data = scrape_url(url)
... # use the data to do analysis
I'd like to write tests for construct_url
and scrape_url
. What's the most sensible way to do this?
Some approaches I've considered:
Python standard testing tools, such as doctest and unittest, can be used directly in a notebook.
A notebook cell with a function and a test case in a docstring:
def add(a, b):
'''
This is a test:
>>> add(2, 2)
5
'''
return a + b
A notebook cell (the last one in the notebook) that runs all test cases in the docstrings:
import doctest
doctest.testmod(verbose=True)
Output:
Trying:
add(2, 2)
Expecting:
5
**********************************************************************
File "__main__", line 4, in __main__.add
Failed example:
add(2, 2)
Expected:
5
Got:
4
1 items had no tests:
__main__
**********************************************************************
1 items had failures:
1 of 1 in __main__.add
1 tests in 2 items.
0 passed and 1 failed.
***Test Failed*** 1 failures.
A notebook cell with a function:
def add(a, b):
return a + b
A notebook cell (the last one in the notebook) that contains a test case. The last line in the cell runs the test case when the cell is executed:
import unittest
class TestNotebook(unittest.TestCase):
def test_add(self):
self.assertEqual(add(2, 2), 5)
unittest.main(argv=[''], verbosity=2, exit=False)
Output:
test_add (__main__.TestNotebook) ... FAIL
======================================================================
FAIL: test_add (__main__.TestNotebook)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-15-4409ad9ffaea>", line 6, in test_add
self.assertEqual(add(2, 2), 5)
AssertionError: 4 != 5
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (failures=1)
While debugging a failed test, it is often useful to halt the test case execution at some point and run a debugger. For this, insert the following code just before the line at which you want the execution to halt:
import pdb; pdb.set_trace()
For example:
def add(a, b):
'''
This is the test:
>>> add(2, 2)
5
'''
import pdb; pdb.set_trace()
return a + b
For this example, the next time you run the doctest, the execution will halt just before the return statement and the Python debugger (pdb) will start. You will get a pdb prompt directly in the notebook, which will allow you to inspect the values of a
and b
, step over lines, etc.
Note: Starting with Python 3.7, the built-in breakpoint()
can be used instead of import pdb; pdb.set_trace()
.
I created a Jupyter notebook for experimenting with the techniques I have just described. You can try it out with