I'm just trying to write a really basic script that'll take some input text and compress it with lzw, using this package: http://packages.python.org/lzw/
I've never tried any encoding with python before and am thoroughly confused =( - I also can't find any documentation online about it, other than the package info.
Here's what I have:
import lzw
file = lzw.readbytes("collectemailinfo.txt", buffersize=1024)
enc = lzw.compress(file)
print enc
Any help or pointers of any kind would be much appreciated!
Thanks =)
Here is the package API : http://packages.python.org/lzw/lzw-module.html
You can read psuedo-code of compression and decompression here
Is there anything else you are confused about?
Here is an example:
Python
In this version the dicts contain mixed typed data:
def compress(uncompressed):
"""Compress a string to a list of output symbols."""
# Build the dictionary.
dict_size = 256
dictionary = dict((chr(i), chr(i)) for i in xrange(dict_size))
# in Python 3: dictionary = {chr(i): chr(i) for i in range(dict_size)}
w = ""
result = []
for c in uncompressed:
wc = w + c
if wc in dictionary:
w = wc
else:
result.append(dictionary[w])
# Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size += 1
w = c
# Output the code for w.
if w:
result.append(dictionary[w])
return result
def decompress(compressed):
"""Decompress a list of output ks to a string."""
# Build the dictionary.
dict_size = 256
dictionary = dict((chr(i), chr(i)) for i in xrange(dict_size))
# in Python 3: dictionary = {chr(i): chr(i) for i in range(dict_size)}
w = result = compressed.pop(0)
for k in compressed:
if k in dictionary:
entry = dictionary[k]
elif k == dict_size:
entry = w + w[0]
else:
raise ValueError('Bad compressed k: %s' % k)
result += entry
# Add w+entry[0] to the dictionary.
dictionary[dict_size] = w + entry[0]
dict_size += 1
w = entry
return result
How to use:
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
decompressed = decompress(compressed)
print (decompressed)
Output:
['T', 'O', 'B', 'E', 'O', 'R', 'N', 'O', 'T', 256, 258, 260, 265, 259, 261, 263]
TOBEORNOTTOBEORTOBEORNOT
NOTE: this example is taken from here