How can I understand a .pyc file content

Niya Simon C picture Niya Simon C · Sep 14, 2015 · Viewed 8.8k times · Source

I have a .pyc file. I need to understand the content of that file to know how the disassembler works of python, i.e. how can I generate a output like dis.dis(function) from .pyc file content.

for e.g.

>>> def sqr(x):  
...     return x*x
...
>>> import dis
>>> dis.dis(sqr)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                0 (x)
              6 BINARY_MULTIPLY     
              7 RETURN_VALUE        

I need to get a output like this using the .pyc file.

Answer

Martijn Pieters picture Martijn Pieters · Sep 14, 2015

.pyc files contain some metadata and a marshaled code object; to load the code object and disassemble that use:

import dis, marshal, sys

header_sizes = [
    # (size, first version this applies to)
    # pyc files were introduced in 0.9.2 way, way back in June 1991.
    (8,  (0, 9, 2)),  # 2 bytes magic number, \r\n, 4 bytes UNIX timestamp
    (12, (3, 6)),     # added 4 bytes file size
    # bytes 4-8 are flags, meaning of 9-16 depends on what flags are set
    # bit 0 not set: 9-12 timestamp, 13-16 file size
    # bit 0 set: 9-16 file hash (SipHash-2-4, k0 = 4 bytes of the file, k1 = 0)
    (16, (3, 7)),     # inserted 4 bytes bit flag field at 4-8 
    # future version may add more bytes still, at which point we can extend
    # this table. It is correct for Python versions up to 3.9
]
header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)

with open(pycfile, "rb") as f:
    metadata = f.read(header_size)  # first header_size bytes are metadata
    code = marshal.load(f)          # rest is a marshalled code object

dis.dis(code)

Demo with the bisect module:

>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_sizes = [(8, (0, 9, 2)), (12, (3, 6)), (16, (3, 7))]
>>> header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
>>> pycfile = getattr(bisect, '__cached__', pycfile.__file__)
>>> with open(pycfile, "rb") as f:
...     metadata = f.read(header_size)  # first header_size bytes are metadata
...     code = marshal.load(f)          # rest is bytecode
... 
>>> dis.dis(code)
  1           0 LOAD_CONST               0 ('Bisection algorithms.')
              2 STORE_NAME               0 (__doc__)

  3           4 LOAD_CONST              12 ((0, None))
              6 LOAD_CONST               3 (<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>)
              8 LOAD_CONST               4 ('insort_right')
             10 MAKE_FUNCTION            1 (defaults)
             12 STORE_NAME               1 (insort_right)

 15          14 LOAD_CONST              13 ((0, None))
             16 LOAD_CONST               5 (<code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>)
             18 LOAD_CONST               6 ('bisect_right')
             20 MAKE_FUNCTION            1 (defaults)
             22 STORE_NAME               2 (bisect_right)

 36          24 LOAD_CONST              14 ((0, None))
             26 LOAD_CONST               7 (<code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>)
             28 LOAD_CONST               8 ('insort_left')
             30 MAKE_FUNCTION            1 (defaults)
             32 STORE_NAME               3 (insort_left)

 49          34 LOAD_CONST              15 ((0, None))
             36 LOAD_CONST               9 (<code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>)
             38 LOAD_CONST              10 ('bisect_left')
             40 MAKE_FUNCTION            1 (defaults)
             42 STORE_NAME               4 (bisect_left)

 71          44 SETUP_FINALLY           12 (to 58)

 72          46 LOAD_CONST               1 (0)
             48 LOAD_CONST              11 (('*',))
             50 IMPORT_NAME              5 (_bisect)
             52 IMPORT_STAR
             54 POP_BLOCK
             56 JUMP_FORWARD            20 (to 78)

 73     >>   58 DUP_TOP
             60 LOAD_NAME                6 (ImportError)
             62 COMPARE_OP              10 (exception match)
             64 POP_JUMP_IF_FALSE       76
             66 POP_TOP
             68 POP_TOP
             70 POP_TOP

 74          72 POP_EXCEPT
             74 JUMP_FORWARD             2 (to 78)
        >>   76 END_FINALLY

 77     >>   78 LOAD_NAME                2 (bisect_right)
             80 STORE_NAME               7 (bisect)

 78          82 LOAD_NAME                1 (insort_right)
             84 STORE_NAME               8 (insort)
             86 LOAD_CONST               2 (None)
             88 RETURN_VALUE

Disassembly of <code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>:
 12           0 LOAD_GLOBAL              0 (bisect_right)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 13          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

Disassembly of <code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>:
 26           0 LOAD_FAST                2 (lo)
              2 LOAD_CONST               1 (0)
              4 COMPARE_OP               0 (<)
              6 POP_JUMP_IF_FALSE       16

 27           8 LOAD_GLOBAL              0 (ValueError)
             10 LOAD_CONST               2 ('lo must be non-negative')
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1

 28     >>   16 LOAD_FAST                3 (hi)
             18 LOAD_CONST               3 (None)
             20 COMPARE_OP               8 (is)
             22 POP_JUMP_IF_FALSE       32

 29          24 LOAD_GLOBAL              1 (len)
             26 LOAD_FAST                0 (a)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (hi)

 30     >>   32 LOAD_FAST                2 (lo)
             34 LOAD_FAST                3 (hi)
             36 COMPARE_OP               0 (<)
             38 POP_JUMP_IF_FALSE       80

 31          40 LOAD_FAST                2 (lo)
             42 LOAD_FAST                3 (hi)
             44 BINARY_ADD
             46 LOAD_CONST               4 (2)
             48 BINARY_FLOOR_DIVIDE
             50 STORE_FAST               4 (mid)

 32          52 LOAD_FAST                1 (x)
             54 LOAD_FAST                0 (a)
             56 LOAD_FAST                4 (mid)
             58 BINARY_SUBSCR
             60 COMPARE_OP               0 (<)
             62 POP_JUMP_IF_FALSE       70
             64 LOAD_FAST                4 (mid)
             66 STORE_FAST               3 (hi)
             68 JUMP_ABSOLUTE           32

 33     >>   70 LOAD_FAST                4 (mid)
             72 LOAD_CONST               5 (1)
             74 BINARY_ADD
             76 STORE_FAST               2 (lo)
             78 JUMP_ABSOLUTE           32

 34     >>   80 LOAD_FAST                2 (lo)
             82 RETURN_VALUE

Disassembly of <code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>:
 45           0 LOAD_GLOBAL              0 (bisect_left)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 46          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

Disassembly of <code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>:
 60           0 LOAD_FAST                2 (lo)
              2 LOAD_CONST               1 (0)
              4 COMPARE_OP               0 (<)
              6 POP_JUMP_IF_FALSE       16

 61           8 LOAD_GLOBAL              0 (ValueError)
             10 LOAD_CONST               2 ('lo must be non-negative')
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1

 62     >>   16 LOAD_FAST                3 (hi)
             18 LOAD_CONST               3 (None)
             20 COMPARE_OP               8 (is)
             22 POP_JUMP_IF_FALSE       32

 63          24 LOAD_GLOBAL              1 (len)
             26 LOAD_FAST                0 (a)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (hi)

 64     >>   32 LOAD_FAST                2 (lo)
             34 LOAD_FAST                3 (hi)
             36 COMPARE_OP               0 (<)
             38 POP_JUMP_IF_FALSE       80

 65          40 LOAD_FAST                2 (lo)
             42 LOAD_FAST                3 (hi)
             44 BINARY_ADD
             46 LOAD_CONST               4 (2)
             48 BINARY_FLOOR_DIVIDE
             50 STORE_FAST               4 (mid)

 66          52 LOAD_FAST                0 (a)
             54 LOAD_FAST                4 (mid)
             56 BINARY_SUBSCR
             58 LOAD_FAST                1 (x)
             60 COMPARE_OP               0 (<)
             62 POP_JUMP_IF_FALSE       74
             64 LOAD_FAST                4 (mid)
             66 LOAD_CONST               5 (1)
             68 BINARY_ADD
             70 STORE_FAST               2 (lo)
             72 JUMP_ABSOLUTE           32

 67     >>   74 LOAD_FAST                4 (mid)
             76 STORE_FAST               3 (hi)
             78 JUMP_ABSOLUTE           32

 68     >>   80 LOAD_FAST                2 (lo)
             82 RETURN_VALUE(

Note that this is separates out the top level code object, defining the module, and the code objects of functions and classes. In Python 3.6 and older the dis.dis() function won't recurse. In those versions, if you wanted to analyse the functions contained, you'll need to load the nested code objects from the top-level code.co_consts array. For example, the insort_right function's code object is loaded with LOAD_CONST 3, so you look for the code object at that index:

>>> code.co_consts[3]
<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>
>>> dis.dis(code.co_consts[3])
 12           0 LOAD_GLOBAL              0 (bisect_right)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 13          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

I personally would avoid trying to parse the .pyc file with anything other than the matching Python version and marshal module. The marshal format is basically an internal serialisation format that changes with the needs of Python itself. New features like list comprehensions and with statements and async/await require new additions to the format, which is not published other than as C source code.

If you do go this route, and manage to read a code object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object; see the dis module source for details on how to do this (you'll have to use the co_firstlineno and co_lnotab attributes to create a bytecode-offset-to-linenumber map, for example).