Convert a ISODate string to mongoDB native ISODate data type

texens picture texens · Jun 21, 2013 · Viewed 12.4k times · Source

My application generates logs in JSON format. The logs looks something like this :

{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '@LOGIN_ID', which was not supplied."}

Currently, I'm pushing in the aforementioned log line as it is into mongoDB. But mongoDB stores the Datetime as a string (which is expected). Now that I want to run some data crunching job on these logs, I'd prefer to store the Datetime as mongoDB's native ISODate data type.

There are 3 ways I can think of for doing this :

i) parse every JSON log line and convert the string to ISODate type in the application code and then insert it. Cons : I'll have to parse each and every line before pushing it to mongoDB, which is going to be a little expensive

ii) After every insert run a query to convert the last inserted document's string date time to ISODate using

element.Datetime = ISODate(element.Datetime);

Cons : Again expensive, as I'm gonna be running one extra query per insert iii) Modify my logs at generation point so that I don't have to do any parsing at application code level, or run an update query after every insert

Also, just curious, is there a way I can configure mongoDB to auto convert datetime strings to its native isodate format ?

TIA

EDIT: I'm using pymongo for inserting the json logs

My file looks something like this :

{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '@LOGIN_ID', which was not supplied."}

There are hundreds of lines like the one mentioned above. And this is how I'm inserting them into mongodb:

for line in logfile:
    collection.insert(json.loads(line))

The following will fix my problem:

for line in logfile:
    data = json.loads(line)
    data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
    collection.insert(data)

What I want to do is get rid of the extra manipulation of datetime I'm having to do above. Hope this clarifies the problem.

Answer

rmartinsjr picture rmartinsjr · Nov 12, 2013

Looks like you already have the answer... I would stick with:

for line in logfile:
    data = json.loads(line)
    data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
    collection.insert(data)

I had a similar problem, but I didn't known beforehand where I should replace it by a datetime object. So I changed my json information to something like:

{"LogLevel":"error","Datetime":{"__timestamp__": "2013-06-21T11:20:17Z"},"Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '@LOGIN_ID', which was not supplied."}

and parsed json with:

json.loads(data, object_hook=logHook)

with 'logHook' defined as:

def logHook(d):
    if '__timestamp__' in d:
        return datetime.strptime(d['__timestamp__'], "%Y-%M-%DTHH:mmZ")
    return d

This logHook function could also be extended to replace many other 'variables' with elif, elif, ...

Hope this helps!