What is meant by shift in dataframe?

rithwik kukunuri picture rithwik kukunuri · Jun 21, 2017 · Viewed 16.7k times · Source

I am stuck in the following lines

import quandl,math
import pandas as pd
import numpy as np
from  sklearn import preprocessing ,cross_validation , svm
from sklearn.linear_model import  LinearRegression


df = quandl.get('WIKI/GOOGL')




df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]

df['HL_PCT'] = (df["Adj. High"] - df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_CHANGE'] = (df["Adj. Close"] - df['Adj. Open'])/df['Adj. Open'] * 100

df = df[['Adj. Close','HL_PCT','PCT_CHANGE','Adj. Open']]

forecast_col = 'Adj. Close'

df.fillna(-99999,inplace = True)

forecast_out = int(math.ceil(.1*len(df)))

df['label'] = df[forecast_col].shift(-forecast_out)
print df.head()

I couldn't understand what is meant by df[forecast_col].shift(-forecast_out)

Please explain the command and what is does??

Answer

Akshay Kandul picture Akshay Kandul · Jun 21, 2017

Shift function of pandas.Dataframe shifts index by desired number of periods with an optional time freq. For further information on shift function please refer this link.

Here is the small example of column values being shifted:

import pandas as pd 
import numpy as np
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-03", "2000-03-05", "2000-01-03", "2000-03-05",
                        "2000-03-05", "2000-07-03", "2000-01-03", "2000-07-03", "2000-07-03"],
               "variable": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D"],
               "no": [1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3],
               "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None]})

Below is the column value before it is shifted

df['value']

output

0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
5   -0.173215
6    0.119209
7   -1.044236
8   -0.861849
9         NaN

Using shift function values are shifted depending on period given

for example using shift with positive integer shifts rows value downwards:

df['value'].shift(1)

output

0         NaN
1    0.469112
2   -0.282863
3   -1.509059
4   -1.135632
5    1.212112
6   -0.173215
7    0.119209
8   -1.044236
9   -0.861849
Name: value, dtype: float64

using shift with negative integer shifts rows value upwards:

df['value'].shift(-1)

output

0   -0.282863
1   -1.509059
2   -1.135632
3    1.212112
4   -0.173215
5    0.119209
6   -1.044236
7   -0.861849
8         NaN
9         NaN
Name: value, dtype: float64