Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads

itsamineral picture itsamineral · Dec 20, 2016 · Viewed 38.5k times · Source

Can somebody please explain the following TensorFlow terms

  1. inter_op_parallelism_threads

  2. intra_op_parallelism_threads

or, please, provide links to the right source of explanation.

I have conducted a few tests by changing the parameters, but the results have not been consistent to arrive at a conclusion.

Answer

mrry picture mrry · Dec 20, 2016

The inter_op_parallelism_threads and intra_op_parallelism_threads options are documented in the source of the tf.ConfigProto protocol buffer. These options configure two thread pools used by TensorFlow to parallelize execution, as the comments describe:

// The execution of an individual op (for some op types) can be
// parallelized on a pool of intra_op_parallelism_threads.
// 0 means the system picks an appropriate number.
int32 intra_op_parallelism_threads = 2;

// Nodes that perform blocking operations are enqueued on a pool of
// inter_op_parallelism_threads available in each process.
//
// 0 means the system picks an appropriate number.
//
// Note that the first Session created in the process sets the
// number of threads for all future sessions unless use_per_session_threads is
// true or session_inter_op_thread_pool is configured.
int32 inter_op_parallelism_threads = 5;

There are several possible forms of parallelism when running a TensorFlow graph, and these options provide some control multi-core CPU parallelism:

  • If you have an operation that can be parallelized internally, such as matrix multiplication (tf.matmul()) or a reduction (e.g. tf.reduce_sum()), TensorFlow will execute it by scheduling tasks in a thread pool with intra_op_parallelism_threads threads. This configuration option therefore controls the maximum parallel speedup for a single operation. Note that if you run multiple operations in parallel, these operations will share this thread pool.

  • If you have many operations that are independent in your TensorFlow graph—because there is no directed path between them in the dataflow graph—TensorFlow will attempt to run them concurrently, using a thread pool with inter_op_parallelism_threads threads. If those operations have a multithreaded implementation, they will (in most cases) share the same thread pool for intra-op parallelism.

Finally, both configuration options take a default value of 0, which means "the system picks an appropriate number." Currently, this means that each thread pool will have one thread per CPU core in your machine.