Python script to search PII

Novice123 picture Novice123 · May 16, 2012 · Viewed 8.1k times · Source

I would like to write a script which can search for and report on Personally Identifiable Information like card numbers, etc in a file system. I would like to find it in txt as well as xls word and PDF files.

Any starting tips or which lib to use are welcome.

I'd also like advice on an efficient way to scan large files for patterns like credit cards etc.

Answer

Don Johnson picture Don Johnson · Oct 17, 2015

give piianalyzer a shot: https://pypi.python.org/pypi/piianalyzer/0.1.0

or you can write your own and use a common regular expression dataset like https://github.com/madisonmay/CommonRegex