Application log aggregation, management and notifications

Matthew Savage picture Matthew Savage · Mar 25, 2009 · Viewed 9.3k times · Source

I'm wondering what everyone is using for logging, log management and log aggregation on their systems.

I am working in a company which uses .NET for all it's applications and all systems are Windows based. Currently each application looks after its own logging and notifications of failures (e.g. if app A fails it will send out its own 'call for help' to an admin).

While this current practice works its a bit hacky and hard to manage. I've been trying to find some options for making this work better and I've come up with the following:

  • log4net & Chainsaw (ah, if it works).
  • Logging via log4net or another framework into a central database & rolling our own management tool.
  • Logging to the Windows event log and using MOM or System Center Operations Manager to aggregate and manage each of these servers & their apps.
  • A hand-rolled solution to suck all the log files into one point and work some magic across them.

Essentially what we are after is something which can pull log entries all together and allow for some analytics to be run across them, plus use a kind of event based system to, for example, send out a warning email when there have been 30+ warning level logs for an application in the last x minutes.

So is there anything I've missed, or something someone else can suggest?

Answer

Paul Stevens picture Paul Stevens · Mar 26, 2009

If you can, I'd recommend writing to the EventLog and creating rules in SCOM to monitor. We use this extensively and it works well, even to a point of putting together pieces of code which monitor certain elements of our apps and writing values to the event log, where SCOM parses for the errors, and graphs those, plus informational errors, into reports showing stats over a given time.

I am however quite keen on rewriting some that into WMI, and having SCOM poll the WMI service for those same counters, as writing queue lengths to event log every 15 minutes seems a little wasteful ;)