Filling missing values using numpy.genfromtxt

Hooked picture Hooked · Jun 25, 2013 · Viewed 15.5k times · Source

Despite the advice from the previous questions:

-9999 as missing value with numpy.genfromtxt()

Using genfromtxt to import csv data with missing values in numpy

I still am unable to process a text file that ends with a missing value,

a.txt:

1 2 3
4 5 6
7 8

I've tried multiple arrangements of options of missing_values, filling_values and can not get this to work:

import numpy as np

sol = np.genfromtxt("a.txt", 
                    dtype=float,
                    invalid_raise=False, 
                    missing_values=None,
                    usemask=True,
                    filling_values=0.0)
print sol

What I would like to get is:

[[1.0 2.0 3.0]
 [4.0 5.0 6.0]
 [7.0 8.0 0.0]]

but instead I get:

/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py:1641: ConversionWarning: Some errors were detected !
    Line #3 (got 2 columns instead of 3)
  warnings.warn(errmsg, ConversionWarning)
[[1.0 2.0 3.0]
 [4.0 5.0 6.0]]

Answer

unutbu picture unutbu · Jun 25, 2013

Using pandas:

import pandas as pd

df = pd.read_table('data', sep='\s+', header=None)
df.fillna(0, inplace=True)
print(df)
#    0  1  2
# 0  1  2  3
# 1  4  5  6
# 2  7  8  0

pandas.read_table replaces missing data with NaNs. You can replace those NaNs with some other value using df.fillna.

df is a pandas.DataFrame. You can access the underlying NumPy array with df.values:

print(df.values)
# [[ 1.  2.  3.]
#  [ 4.  5.  6.]
#  [ 7.  8.  0.]]