My application generates logs in JSON format. The logs looks something like this :
{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '@LOGIN_ID', which was not supplied."}
Currently, I'm pushing in the aforementioned log line as it is into mongoDB. But mongoDB stores the Datetime as a string (which is expected). Now that I want to run some data crunching job on these logs, I'd prefer to store the Datetime as mongoDB's native ISODate data type.
There are 3 ways I can think of for doing this :
i) parse every JSON log line and convert the string to ISODate type in the application code and then insert it. Cons : I'll have to parse each and every line before pushing it to mongoDB, which is going to be a little expensive
ii) After every insert run a query to convert the last inserted document's string date time to ISODate using
element.Datetime = ISODate(element.Datetime);
Cons : Again expensive, as I'm gonna be running one extra query per insert iii) Modify my logs at generation point so that I don't have to do any parsing at application code level, or run an update query after every insert
Also, just curious, is there a way I can configure mongoDB to auto convert datetime strings to its native isodate format ?
TIA
EDIT: I'm using pymongo for inserting the json logs
My file looks something like this :
{"LogLevel":"error","Datetime":"2013-06-21T11:20:17Z","Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '@LOGIN_ID', which was not supplied."}
There are hundreds of lines like the one mentioned above. And this is how I'm inserting them into mongodb:
for line in logfile:
collection.insert(json.loads(line))
The following will fix my problem:
for line in logfile:
data = json.loads(line)
data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
collection.insert(data)
What I want to do is get rid of the extra manipulation of datetime I'm having to do above. Hope this clarifies the problem.
Looks like you already have the answer... I would stick with:
for line in logfile:
data = json.loads(line)
data["Datetime"] = datetime.strptime(data["Datetime"], "%Y-%M-%DTHH:mmZ")
collection.insert(data)
I had a similar problem, but I didn't known beforehand where I should replace it by a datetime object. So I changed my json information to something like:
{"LogLevel":"error","Datetime":{"__timestamp__": "2013-06-21T11:20:17Z"},"Module":"DB","Method":"ExecuteSelect","Request":"WS_VALIDATE","Error":"Procedure or function 'WS_VALIDATE' expects parameter '@LOGIN_ID', which was not supplied."}
and parsed json with:
json.loads(data, object_hook=logHook)
with 'logHook' defined as:
def logHook(d):
if '__timestamp__' in d:
return datetime.strptime(d['__timestamp__'], "%Y-%M-%DTHH:mmZ")
return d
This logHook function could also be extended to replace many other 'variables' with elif, elif, ...
Hope this helps!