What I want to achieve is to have an /etc/init.d script which more reliably starts Mongodb, even if it went down hard -- it should attempt an auto-repair in case the system is in a locked state.
Yes, I could script this myself, but I think somebody out there must have done this already.
I noticed that after a server goes down hard, that Mongodb is in a state where it doesn't restart via the /etc/init.d/mongod script. Obviously the lock file(s) need to be removed and it needs to be started with the --repair option and correct --dbpath first, before it can be successfully restarted. In some cases one also needs to change the ownership of the db-files to the user who runs mongodb. One additional problem is that the standard /etc/init.d/mongod script does not report a failure in this situation, but rather joyfully and incorrectly returns with "OK" status, reporting that Mongod was started, although it wasn't.
$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
[ OK ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked
The OS is either CentOS or Fedora.
Does anybody have modified /etc/init.d scripts or a pointer to such scripts, which attempt a repair automatically in that situation? Or is there another tool which functions as a watch dog for Mongod?
Any opinions on why it might be a bad idea to try to automatically repair mongodb?
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked
$ sudo ls -l /var/lib/mongo/mongod.lock
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock
$ sudo tail -50 /data/mongo/log/mongod.log
**************
old lock file: /data/mongo/db/mongod.lock. probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit:
Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44 closeAllFiles() finished
Sat Nov 19 11:55:44 dbexit: really exiting now
So the first bit to mention is journaling. Journaling is effectively billed as "fast repair". Journaling is on by default in 2.0+ and it will perform that "repair" by default.
So if your disks can handle the extra write-throughput of journaling this may solve your problem.
Any opinions on why it might be a bad idea to try to automatically repair mongodb?
The #1 issue with repairing MongoDB automatically is simply one of time.
If you have a 200GB database, the system will need to do the following when repairing:
200GB read
)200GB write
)200GB reads + large number of writes
)If you look at my notes that's a serious amount of drive thrashing to perform a repair.
But most production installs are running replica sets. In this case, instead of repairing, you can just restore from a backup. Restoring from a backup only writes the data once and it's a process you should already have in place.
Despite the init.d
script returning OK
, your system monitoring should tell you that the DB is not up.