How to filter a CSV file without Pandas? (Best Substitute for Pandas in Pythonista)

zeh picture zeh · Nov 20, 2016 · Viewed 7.5k times · Source

I am trying to do some data analysis on Pythonista 3 (iOS app for python), however because of the C libraries of pandas it does not compile in the iOS device.

Is there any substitute for Pandas? Would numpy be an option for data of type string?

The data set I have at the moment is the history of messages between my friends and I.

The whole history is in one csv file. Each row has the columns 'day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'

The goal of the analysis is to produce a report of our chat for the past year.

I want be able to count number of messages each friend sent. I want to be able to plot a histogram of the hours in which the messages where sent by each friend. Then, I want to do some word counting individually and as a group.

In Pandas I know how to do that. For example:

df = read_csv("messages.csv")
number_of_messages_friend1 = len(df[df.author_of_message == 'friend1']

How can I filter a csv file without Pandas?

Answer

JonB picture JonB · Nov 20, 2016

Since Pythonista does have numpy, you will want to look at recarrays, which are numpy's approach to this type of problem. The following worked out of the box in Pythonista for me:

import numpy as np
df=np.recfromcsv('messages.csv')
len(df[df.author_of_message==b'friend1'])

Depending on your data format, tou may find that recsfromcsv "just works", since it tries to guess data types, or you might need to customize things a bit. See genfromtext for a number of options, such as explictly specifying data types or for using converters for converting string dates to datetime objects. recsfromcsv is just a convienece wrapper around genfromtext

https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#

Once in recarray, many of the simple indexing operations work the same as in pandas. Note you may need to do string compares using b-prefixed strings (bytes objects), unless you convert to unicode strings, as shown above.