I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015
Without the invalid year, this was working for me:
end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))
But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2
, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters
.
Any pointers? I would just slice end_date
but im hoping there is a datetime-safe strategy.
Unless you want to rewrite strptime
(a very bad idea), the only real option you have is to slice end_date
and chop off the extra characters at the end, assuming that this will give you the correct result you intend.
For example, you can catch the ValueError
, slice, and try again:
def parse_prefix(line, fmt):
try:
t = time.strptime(line, fmt)
except ValueError as v:
if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
line = line[:-(len(v.args[0]) - 26)]
t = time.strptime(line, fmt)
else:
raise
return t
For example:
parse_prefix(
'2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
'%Y-%m-%d %H:%M:%S'
) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...