Check if a string is a possible abbrevation for a name

Question 1

Check if a string is a possible abbrevation for a name

python string-matching slug abbreviation text-analysis

Björn Lindqvist · Sep 7, 2011 · Viewed 9.7k times · Source

Answer

Answer

This passes all the tests, including a few extra I created. It uses recursion. Here are the rules that I used:

The first letter of the abbreviation must match the first letter of the text
The rest of the abbreviation (the abbrev minus the first letter) must be an abbreviation for:
- the remaining words, or
- the remaining text starting from any position in the first word.

tests=(
    ('fck','fc kopenhavn',True),
    ('fco','fc kopenhavn',False),
    ('irl','in real life',True),
    ('irnl','in real life',False),    
    ('ifk','ifk gotebork',True),   
    ('ifko','ifk gotebork',False),    
    ('aik','allmanna idrottskluben',True),
    ('aid','allmanna idrottskluben',True),
    ('manu','manchester united',True), 
    ('fz','faz zoo',True), 
    ('fzz','faz zoo',True),
    ('fzzz','faz zoo',False),    
    )

def is_abbrev(abbrev, text):
    abbrev=abbrev.lower()
    text=text.lower()
    words=text.split()
    if not abbrev:
        return True
    if abbrev and not text:
        return False
    if abbrev[0]!=text[0]:
        return False
    else:
        return (is_abbrev(abbrev[1:],' '.join(words[1:])) or
                any(is_abbrev(abbrev[1:],text[i+1:])
                    for i in range(len(words[0]))))

for abbrev,text,answer in tests:
    result=is_abbrev(abbrev,text)
    print(abbrev,text,result,answer)
    assert result==answer

Question 2

I'm trying to develop a python algorithm to check if a string could be an abbrevation for another word. For example

fck is a match for fc kopenhavn because it matches the first characters of the word. fhk would not match.
fco should not match fc kopenhavn because no one irl would abbrevate FC Kopenhavn as FCO.
irl is a match for in real life.
ifk is a match for ifk goteborg.
aik is a match for allmanna idrottskluben.
aid is a match for allmanna idrottsklubben. This is not a real team name abbrevation, but I guess it is hard to exclude it unless you apply domain specific knowledge on how Swedish abbrevations are formed.
manu is a match for manchester united.

It is hard to describe the exact rules of the algorithm, but I hope my examples show what I'm after.

Update I made a mistake in showing the strings with the matching letters uppercased. In the real scenario, all letters are lowercase so it is not as easy as just checking which letters are uppercased.

Check if a string is a possible abbrevation for a name

Answer

Related questions