I want to be able to program multiple threads with gnu octave so it will utilize multiple processors.
I installed GNU Octave on Fedora 17 Linux and did the following:
yum install octave
Which installed on my computer the latest version of octave, 3.6.2. It works great, however when you multiply two huge matrices together it bogs down the one CPU that octave uses. It would be nice if the matrix multiplication utilizes all of the cores, since in this case the CPU is obviously the bottleneck.
Can octave fully utilize multi-core processors and run on multiple threads? Is there a library or compile time flag for this?
Solution
Octave itself is a single-thread application that runs on one core. You can get octave to use some libraries like ATLAS which utilize multiple cores. So while Octave only uses one core, when you encounter a heavy operation, octave calls functions in ATLAS that utilize many CPU's.
I was able to do this. First compile 'ATLAS' from source code and make it available to your system so that octave can find it and use those library functions. ATLAS tunes itself to your system and number of cores. When you install octave from source and specify ATLAS, it uses it, so when octave does a heavy operation like a huge matrix multiplication, ATLAS decides how many cpu's to use.
I was unable to get this to work for Fedora, but on Gentoo I could get it to work.
I used these two links: ftp://ftp.gnu.org/gnu/octave/
http://math-atlas.sourceforge.net/
I ran the following octave core before and after ATLAS install:
tic
bigMatrixA = rand(3000000,80);
bigMatrixB = rand(80,30);
bigMatrixC = bigMatrixA * bigMatrixB;
toc
disp("done");
The matrix multiplication goes much faster using multiple processors, which was 3 times faster than before with single core:
Without Atlas: Elapsed time is 3.22819 seconds.
With Atlas: Elapsed time is 0.529 seconds.
The three libraries I am using which speed things up are
blas-atlas
,
cblas-atlas
,
lapack-atlas
.
If octave can use these instead of the default blas, and lapack libraries, then it will utilize multi core.
It is not easy and takes some programming skill to get octave to compile from source with ATLAS.
Drabacks to using Atlas:
This Atlas software uses a lot of overhead to split your octave program into multiple threads. Sure it goes much faster if all you are doing is huge matrix multiplications, but most commands can't be multi-threaded by atlas. If extracting every bit of processing power/speed out of your cores is top priority then you'll have much better luck just writing your program to be run in parallel with itself. (Split your program into 8 equivalent programs that work on 1/8th of the problem and run them all simultaneously, when all are done, reassemble the results).
Atlas helps a single threaded octave program behave a little bit more like a multi-threaded app but it is no silver bullet. Atlas won't make your single threaded Octave program max out your 2,4,6,8 core processor. You'll notice a performance boost, but the boost will leave you searching for a better way to use all the processor. The answer is writing your program to run in parallel with itself, and this takes a lot of programming skill.
Suggestion
Put your energy into vectorizing your heaviest operations and distributing the process over n simultaneous running threads. If you are waiting too long for a process to run, most likely the lowest hanging fruit to speed it up is using a more efficient algorithm or data structure.