How to remove unconverted data from a Python datetime object

Ben Keating picture Ben Keating · Feb 18, 2011 · Viewed 100.4k times · Source

I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015

Without the invalid year, this was working for me:

end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))

But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters.

Any pointers? I would just slice end_date but im hoping there is a datetime-safe strategy.

Answer

Adam Rosenfield picture Adam Rosenfield · Feb 18, 2011

Unless you want to rewrite strptime (a very bad idea), the only real option you have is to slice end_date and chop off the extra characters at the end, assuming that this will give you the correct result you intend.

For example, you can catch the ValueError, slice, and try again:

def parse_prefix(line, fmt):
    try:
        t = time.strptime(line, fmt)
    except ValueError as v:
        if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
            line = line[:-(len(v.args[0]) - 26)]
            t = time.strptime(line, fmt)
        else:
            raise
    return t

For example:

parse_prefix(
    '2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
    '%Y-%m-%d %H:%M:%S'
) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...