I want to convert this matrix into a pandas dataframe. csc_matrix
The first number in the bracket should be the index, the second number being columns and the number in the end being the data.
I want to do this to do feature selection in text analysis, the first number represents the document, the second being the feature of word and the last number being the TFIDF score.
Getting a dataframe helps me to transform the text analysis problem into data analysis.
from scipy.sparse import csc_matrix
csc = csc_matrix(np.array(
[[0, 0, 4, 0, 0, 0],
[1, 0, 0, 0, 2, 0],
[2, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1],
[4, 0, 3, 2, 0, 0]]))
# Return a Coordinate (coo) representation of the Compresses-Sparse-Column (csc) matrix.
coo = csc.tocoo(copy=False)
# Access `row`, `col` and `data` properties of coo matrix.
>>> pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data}
)[['index', 'col', 'data']].sort_values(['index', 'col']
).reset_index(drop=True)
index col data
0 0 2 4
1 1 0 1
2 1 4 2
3 2 0 2
4 2 3 1
5 3 5 1
6 4 0 4
7 4 2 3
8 4 3 2