Checking whether data frame is copy or view in Pandas

nick_eu picture nick_eu · Nov 12, 2014 · Viewed 17.8k times · Source

Is there an easy way to check whether two data frames are different copies or views of the same underlying data that doesn't involve manipulations? I'm trying to get a grip on when each is generated, and given how idiosyncratic the rules seem to be, I'd like an easy way to test.

For example, I thought "id(df.values)" would be stable across views, but they don't seem to be:

# Make two data frames that are views of same data.
df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'], 
       columns = ['a','b','c','d'])
df2 = df.iloc[0:2,:]

# Demonstrate they are views:
df.iloc[0,0] = 99
df2.iloc[0,0]
Out[70]: 99

# Now try and compare the id on values attribute
# Different despite being views! 

id(df.values)
Out[71]: 4753564496

id(df2.values)
Out[72]: 4753603728

# And we can of course compare df and df2
df is df2
Out[73]: False

Other answers I've looked up that try to give rules, but don't seem consistent, and also don't answer this question of how to test:

And of course: - http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

UPDATE: Comments below seem to answer the question -- looking at the df.values.base attribute rather than df.values attribute does it, as does a reference to the df._is_copy attribute (though the latter is probably very bad form since it's an internal).

Answer

nick_eu picture nick_eu · Nov 12, 2014

Answers from HYRY and Marius in comments!

One can check either by:

  • testing equivalence of the values.base attribute rather than the values attribute, as in:

    df.values.base is df2.values.base instead of df.values is df2.values.

  • or using the (admittedly internal) _is_view attribute (df2._is_view is True).

Thanks everyone!