error in plm regression

Ruslan Sayakhov picture Ruslan Sayakhov · Apr 27, 2017 · Viewed 16k times · Source

colleagues! I have panel data:

    Company year       Beta     NI   Sales  Export Hedge      FL     QR     AT Foreign
1       1 2010 -2.2052800 293000 1881000 78.6816     0 23.5158  1.289 0.6554    3000
2       1 2011 -2.2536069 316000 2647000 81.4885     0 21.7945 1.1787 0.8282   22000
3       1 2012  0.3258693 363000 2987000 82.4908     0 24.5782 1.2428  0.813  -11000
4       1 2013  0.4006030 549000 4546000 79.4325     0 31.4168 0.6038 0.7905   71000
5       1 2014 -0.4508811 348000 5376000 79.2411     0 37.1451 0.6563  0.661  -64000
6       1 2015  0.1494696 355000 5038000 77.1735     0 33.3852 0.9798 0.5483   37000

But R shows the mistake when I try to use plm package for the regression:

panel <- read.csv("Panel.csv",  header=T, sep=";")
p=plm(data=panel,Beta~NI, model="within",index=c("id","year"))


Error in pdim.default(index[[1]], index[[2]]) : 
  duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
2: In is.pbalanced.default(index[[1]], index[[2]]) :
  duplicate couples (id-time)

3: In is.pbalanced.default(index[[1]], index[[2]]) :
  duplicate couples (id-time)

I searched this error in the Internet and read that it's connected with the id of company and year. But I did not find the way how to avoid this problem. Also, when I do na.omit(panel), R does not show the error, but it's significant to stay NA data and companies in the data. Please, tell me to do with this problem. Thank you.

Answer

Marco Sandri picture Marco Sandri · Apr 27, 2017

Let consider the Produc dataset in the plm package.

data("Produc", package = "plm")
head(Produc)

    state year region     pcap     hwy   water    util       pc   gsp    emp unemp
1 ALABAMA 1970      6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5   4.7
2 ALABAMA 1971      6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9   5.2
3 ALABAMA 1972      6 15972.41 7765.42 1764.75 6442.23 38670.30 31303 1072.3   4.7
4 ALABAMA 1973      6 16406.26 7907.66 1742.41 6756.19 40084.01 33430 1135.5   3.9
5 ALABAMA 1974      6 16762.67 8025.52 1734.85 7002.29 42057.31 33749 1169.8   5.5
6 ALABAMA 1975      6 17316.26 8158.23 1752.27 7405.76 43971.71 33604 1155.4   7.7

In this dataset information are collected over time (17 years) and over the same sample units (48 US States).

table(Produc$state, Produc$year)
                 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
  ALABAMA           1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
  ARIZONA           1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
  ARKANSAS          1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
  CALIFORNIA        1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1
  ...

plm requires that each (state, year) pair be unique.

any(table(Produc$state, Produc$year)!=1)
[1] FALSE

The command plm works nicely with this dataset:

plmFit1 <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))
summary(plmFit1)


Oneway (individual) effect Within Model
Call:
plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
    data = Produc, index = c("state", "year"))

Balanced Panel: n=48, T=17, N=816

Residuals :
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-0.12000 -0.02370 -0.00204  0.01810  0.17500 

Coefficients :
             Estimate  Std. Error t-value  Pr(>|t|)    
log(pcap) -0.02614965  0.02900158 -0.9017    0.3675    
log(pc)    0.29200693  0.02511967 11.6246 < 2.2e-16 ***
log(emp)   0.76815947  0.03009174 25.5273 < 2.2e-16 ***
unemp     -0.00529774  0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    18.941
Residual Sum of Squares: 1.1112
R-Squared:      0.94134
Adj. R-Squared: 0.93742
F-statistic: 3064.81 on 4 and 764 DF, p-value: < 2.22e-16

Now we duplicate one of the (state, year) pairs:

 Produc[2,2] <- 1970
 any(table(Produc$state, Produc$year)>1)
 [1] TRUE

and plm now generates the same error message that you described above:

zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
      data = Produc, index = c("state","year"))

Error in pdim.default(index[[1]], index[[2]]) : 
  duplicate couples (id-time)
Inoltre: Warning messages:
1: In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
2: In is.pbalanced.default(index[[1]], index[[2]]) :
  duplicate couples (id-time)

3: In is.pbalanced.default(index[[1]], index[[2]]) :
  duplicate couples (id-time)

Hope this can help you.