Scipy sparse matrices - purpose and usage of different implementations

DilithiumMatrix picture DilithiumMatrix · Apr 2, 2013 · Viewed 10.6k times · Source

Scipy has many different types of sparse matrices available. What are the most important differences between these types, and what is the difference in their intended usage?

I'm developing a code in python based on a sample code1 in Matlab. One section of the code utilizes sparse matrices - which seem to have a single (annoying) type in Matlab, and I'm trying to figure out which type I should use2 in python.


1: This is for a class. Most people are doing the project in Matlab, but I like to create unnecessary work and confusion --- apparently.

2: This is an academic question: I have the code working properly with the 'CSR' format, but I'm interesting in knowing what the optimal usages are.

Answer

Will picture Will · Jun 1, 2013

Sorry if I'm not answering this completely enough, but hopefully I can provide some insight.

CSC (Compressed Sparse Column) and CSR (Compressed Sparse Row) are more compact and efficient, but difficult to construct "from scratch". Coo (Coordinate) and DOK (Dictionary of Keys) are easier to construct, and can then be converted to CSC or CSR via matrix.tocsc() or matrix.tocsr().

CSC is more efficient at accessing column-vectors or column operations, generally, as it is stored as arrays of columns and their value at each row.

CSR matrices are the opposite; stored as arrays of rows and their values at each column, and are more efficient at accessing row-vectors or row operations.