What’s a good Python profanity filter library?

Question 1

What’s a good Python profanity filter library?

python nlp profanity

Paul D. Waite · Aug 20, 2010 · Viewed 24.9k times · Source

Answer

Answer

I didn't found any Python profanity library, so I made one myself.

Parameters

`filterlist`

A list of regular expressions that match a forbidden word. Please do not use \b, it will be inserted depending on inside_words.

Example: ['bad', 'un\w+']

`ignore_case`

Default: True

Self-explanatory.

`replacements`

Default: "$@%-?!"

A string with characters from which the replacements strings will be randomly generated.

Examples: "%&$?!" or "-" etc.

`complete`

Default: True

Controls if the entire string will be replaced or if the first and last chars will be kept.

`inside_words`

Default: False

Controls if words are searched inside other words too. Disabling this

Module source

(examples at the end)

"""
Module that provides a class that filters profanities

"""

__author__ = "leoluk"
__version__ = '0.0.1'

import random
import re

class ProfanitiesFilter(object):
    def __init__(self, filterlist, ignore_case=True, replacements="$@%-?!", 
                 complete=True, inside_words=False):
        """
        Inits the profanity filter.

        filterlist -- a list of regular expressions that
        matches words that are forbidden
        ignore_case -- ignore capitalization
        replacements -- string with characters to replace the forbidden word
        complete -- completely remove the word or keep the first and last char?
        inside_words -- search inside other words?

        """

        self.badwords = filterlist
        self.ignore_case = ignore_case
        self.replacements = replacements
        self.complete = complete
        self.inside_words = inside_words

    def _make_clean_word(self, length):
        """
        Generates a random replacement string of a given length
        using the chars in self.replacements.

        """
        return ''.join([random.choice(self.replacements) for i in
                  range(length)])

    def __replacer(self, match):
        value = match.group()
        if self.complete:
            return self._make_clean_word(len(value))
        else:
            return value[0]+self._make_clean_word(len(value)-2)+value[-1]

    def clean(self, text):
        """Cleans a string from profanity."""

        regexp_insidewords = {
            True: r'(%s)',
            False: r'\b(%s)\b',
            }

        regexp = (regexp_insidewords[self.inside_words] % 
                  '|'.join(self.badwords))

        r = re.compile(regexp, re.IGNORECASE if self.ignore_case else 0)

        return r.sub(self.__replacer, text)


if __name__ == '__main__':

    f = ProfanitiesFilter(['bad', 'un\w+'], replacements="-")    
    example = "I am doing bad ungood badlike things."

    print f.clean(example)
    # Returns "I am doing --- ------ badlike things."

    f.inside_words = True    
    print f.clean(example)
    # Returns "I am doing --- ------ ---like things."

    f.complete = False    
    print f.clean(example)
    # Returns "I am doing b-d u----d b-dlike things."

Question 2

Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services.

(And whilst it’s always great to hear your fundamental objections of principle to profanity filtering, I’m not specifically looking for them here. I know profanity filtering can’t pick up every hurtful thing being said. I know swearing, in the grand scheme of things, isn’t a particularly big issue. I know you need some human input to deal with issues of content. I’d just like to find a good library, and see what use I can make of it.)

What’s a good Python profanity filter library?

Answer

Parameters

filterlist

ignore_case

replacements

complete

inside_words

Module source

Related questions

`filterlist`

`ignore_case`

`replacements`

`complete`

`inside_words`