Python scikit learn n_jobs

Bruno Hanzen picture Bruno Hanzen · Sep 24, 2015 · Viewed 31.3k times · Source

This is not a real issue, but I'd like to understand:

  • running sklearn from Anaconda distrib on a Win7 4 cores 8 GB system
  • fitting a KMeans model on a 200.000 samples*200 values table.
  • running with n-jobs = -1: (after adding the if __name__ == '__main__': line to my script) I see the script starting 4 processes with 10 threads each. Each process uses about 25% of the CPU (total: 100%). Seems to work as expected
  • running with n-jobs = 1: stays on a single process (not a surprise), with 20 threads, and also uses 100% of the CPU.

My question: what is the point of using n-jobs (and joblib) if the the library uses all cores anyway? Am I missing something? Is it a Windows-specific behaviour?

Answer

Sim picture Sim · Mar 21, 2019
  • what is the point of using n-jobs (and joblib) if the the library uses all cores anyway?

It does not, if you specify n_jobs to -1, it will use all cores. If it is set to 1 or 2, it will use one or two cores only (test done scikit-learn 0.20.3 under Linux).