I'm developing a program that needs to do heavy linear algebra calculations.
Now I'm using LAPACK/BLAS routines, but I need to exploit my machine (24 core Xeon X5690).
I've found projects like pblas and scalapack, but they all seem to focus on distributed computing and on using MPI.
I have no cluster available, all computations will be done on a single server and using MPI looks like an overkill.
Does anyone have any suggestion on this?
As mentioned by @larsmans (with, say, MKL), you still use LAPACK + BLAS interfaces, but you just find a tuned, multithreaded version for your platform. MKL is great, but expensive. Other, open-source, options include:
I'd also agree with Mark's comment; depending on what LAPACK routines you're using, the distributed memory stuff with MPI might actually be faster than the multithreaded. That's unlikely to be the case with BLAS routines, but for something more complicated (say the eigenvalue/vector routines in LAPACK) it's worth testing. While it's true that MPI function calls are an overhead, doing things in a distributed-memory mode means you don't have to worry so much about false sharing, synchronizing access to shared variables, etc.