I'm using the R caret package to generate a model. I'm using PCA in the pre-process for dimensionality reduction and then trying to generate a logistic regression model.
I'm getting this error:
Error in contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
credit <- read.csv('~Loans Question/RequiredAttributesWithLoanStatus.csv')
credit$LoanStatus <- as.factor(credit$LoanStatus)
str(credit)
'data.frame': 8580 obs. of 45 variables:
$ ListingCategory : int 1 7 3 1 1 7 1 1 1 1 ...
$ IncomeRange : int 3 4 6 4 4 3 3 4 3 3 ...
$ StatedMonthlyIncome : num 2583 4326 10500 4167 5667 ...
$ IncomeVerifiable : logi TRUE TRUE TRUE FALSE TRUE TRUE ...
$ DTIwProsperLoan : num 1.8e-01 2.0e-01 1.7e-01 1.0e+06 1.8e-01 4.4e-01 2.2e-01 2.0e-01 2.0e-01 3.1e-01 ...
$ EmploymentStatusDescription: Factor w/ 7 levels "Employed","Full-time",..: 1 4 1 7 1 1 1 1 1 1 ...
$ Occupation : Factor w/ 65 levels "","Accountant/CPA",..: 37 37 20 14 43 58 48 37 37 37 ...
$ MonthsEmployed : int 4 44 159 67 26 16 209 147 24 9 ...
$ BorrowerState : Factor w/ 48 levels "AK","AL","AR",..: 22 32 5 5 14 28 4 10 10 34 ...
$ BorrowerCity : Factor w/ 3089 levels "AARONSBURG","ABERDEEN",..: 1737 3059 2488 654 482 719 895 1699 2747 1903 ...
$ BorrowerMetropolitanArea : Factor w/ 1 level "(Not Implemented)": 1 1 1 1 1 1 1 1 1 1 ...
$ LenderIndicator : int 0 0 0 1 0 0 0 0 1 0 ...
$ GroupIndicator : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
$ GroupName : Factor w/ 83 levels "","00 Used Car Loans",..: 1 1 1 47 1 1 1 1 1 1 ...
$ ChannelCode : int 90000 90000 90000 80000 40000 40000 90000 90000 80000 90000 ...
$ AmountParticipation : int 0 0 0 0 0 0 0 0 0 0 ...
$ MonthlyDebt : int 247 785 1631 817 644 1524 427 817 654 749 ...
$ CurrentDelinquencies : int 0 0 0 0 0 0 0 1 0 1 ...
$ DelinquenciesLast7Years : int 0 10 0 0 0 0 0 0 0 0 ...
$ PublicRecordsLast10Years : int 0 1 0 0 0 0 1 0 1 0 ...
$ PublicRecordsLast12Months : int 0 0 0 0 0 0 0 0 0 0 ...
$ FirstRecordedCreditLine : Factor w/ 4719 levels "1/1/00 0:00",..: 3032 2673 1197 2541 4698 4345 3150 925 4452 2358 ...
$ CreditLinesLast7Years : int 53 30 36 26 7 22 15 20 34 32 ...
$ InquiriesLast6Months : int 2 8 5 0 0 0 0 3 0 0 ...
$ AmountDelinquent : int 0 0 0 0 0 0 0 63 0 15 ...
$ CurrentCreditLines : int 10 10 18 10 4 11 6 10 7 8 ...
$ OpenCreditLines : int 9 10 15 8 3 8 5 7 7 8 ...
$ BankcardUtilization : num 0.26 0.69 0.94 0.69 0.81 0.38 0.55 0.24 0.03 0 ...
$ TotalOpenRevolvingAccounts : int 9 7 12 10 3 5 4 5 4 6 ...
$ InstallmentBalance : int 48648 14827 0 0 0 30916 0 21619 41340 15447 ...
$ RealEstateBalance : int 0 0 577745 0 0 0 191296 0 0 126039 ...
$ RevolvingBalance : int 5265 9967 94966 50511 37871 22463 19550 2436 1223 3236 ...
$ RealEstatePayment : int 0 0 4159 0 0 0 1303 0 0 1279 ...
$ RevolvingAvailablePercent : int 78 52 36 45 18 61 44 74 96 76 ...
$ TotalInquiries : int 8 11 15 2 0 0 1 7 1 1 ...
$ TotalTradeItems : int 53 30 36 26 7 22 15 20 34 32 ...
$ SatisfactoryAccounts : int 52 23 36 26 7 19 15 18 34 29 ...
$ NowDelinquentDerog : int 0 0 0 0 0 0 0 1 0 1 ...
$ WasDelinquentDerog : int 1 7 0 0 0 3 0 1 0 2 ...
$ OldestTradeOpenDate : int 5092001 5011977 12011984 4272000 9081993 9122000 6161987 11181999 9191990 4132000 ...
$ DelinquenciesOver30Days : int 0 6 0 0 0 13 0 2 0 2 ...
$ DelinquenciesOver60Days : int 0 4 0 0 0 0 0 0 0 1 ...
$ DelinquenciesOver90Days : int 0 10 0 0 0 0 0 0 0 0 ...
$ IsHomeowner : logi FALSE FALSE TRUE FALSE FALSE FALSE ...
$ LoanStatus : Factor w/ 4 levels "1","2","3","4": 4 2 2 4 4 4 4 4 4 3 ...
summary(credit)
ListingCategory IncomeRange StatedMonthlyIncome IncomeVerifiable
Min. : 0.000 Min. :1.000 Min. : 0 Mode :logical
1st Qu.: 1.000 1st Qu.:3.000 1st Qu.: 3167 FALSE:784
Median : 2.000 Median :4.000 Median : 4750 TRUE :7796
Mean : 4.997 Mean :4.089 Mean : 5755 NA's :0
3rd Qu.: 7.000 3rd Qu.:5.000 3rd Qu.: 7083
Max. :20.000 Max. :7.000 Max. :250000
DTIwProsperLoan EmploymentStatusDescription
Min. : 0.0 Employed :7182
1st Qu.: 0.1 Full-time : 416
Median : 0.2 Not employed : 122
Mean : 91609.4 Other : 475
3rd Qu.: 0.3 Part-time : 7
Max. :1000000.0 Retired : 32
Self-employed: 346
Occupation MonthsEmployed BorrowerState
Other :2421 Min. :-23.00 CA :1056
Professional :1040 1st Qu.: 26.00 FL : 608
Computer Programmer : 345 Median : 68.00 NY : 574
Executive : 334 Mean : 97.44 TX : 532
Administrative Assistant: 325 3rd Qu.:139.00 IL : 443
Teacher : 301 Max. :755.00 GA : 343
(Other) :3814 NA's :5 (Other):5024
BorrowerCity BorrowerMetropolitanArea LenderIndicator
CHICAGO : 121 (Not Implemented):8580 Min. :0.00000
NEW YORK : 91 1st Qu.:0.00000
BROOKLYN : 88 Median :0.00000
HOUSTON : 64 Mean :0.09196
LAS VEGAS: 53 3rd Qu.:0.00000
ATLANTA : 51 Max. :1.00000
(Other) :8112
GroupIndicator GroupName
Mode :logical :8326
FALSE:8325 We do not accept new membership requests: 39
TRUE :255 BORROWERS - LARGEST GROUP : 29
NA's :0 LendersClub : 17
Debt Consolidators : 12
Have Money - Will Bid : 10
(Other) : 147
ChannelCode AmountParticipation MonthlyDebt CurrentDelinquencies
Min. :40000 Min. :0 Min. : 0.0 Min. : 0.0000
1st Qu.:80000 1st Qu.:0 1st Qu.: 364.0 1st Qu.: 0.0000
Median :80000 Median :0 Median : 708.0 Median : 0.0000
Mean :77196 Mean :0 Mean : 885.5 Mean : 0.4119
3rd Qu.:90000 3rd Qu.:0 3rd Qu.: 1205.2 3rd Qu.: 0.0000
Max. :90000 Max. :0 Max. :30213.0 Max. :21.0000
DelinquenciesLast7Years PublicRecordsLast10Years PublicRecordsLast12Months
Min. : 0.000 Min. : 0.0000 Min. :0.00000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000
Median : 0.000 Median : 0.0000 Median :0.00000
Mean : 4.009 Mean : 0.2809 Mean :0.01364
3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.:0.00000
Max. :99.000 Max. :11.0000 Max. :4.00000
FirstRecordedCreditLine CreditLinesLast7Years InquiriesLast6Months
12/1/93 0:00: 20 Min. : 2.0 Min. : 0.0000
3/1/95 0:00 : 19 1st Qu.: 16.0 1st Qu.: 0.0000
6/1/90 0:00 : 17 Median : 24.0 Median : 1.0000
6/1/89 0:00 : 16 Mean : 26.1 Mean : 0.9994
12/1/90 0:00: 15 3rd Qu.: 34.0 3rd Qu.: 1.0000
2/1/94 0:00 : 14 Max. :115.0 Max. :15.0000
(Other) :8479
AmountDelinquent CurrentCreditLines OpenCreditLines BankcardUtilization
Min. : 0 Min. : 0.000 Min. : 0.000 Min. :0.0000
1st Qu.: 0 1st Qu.: 5.000 1st Qu.: 5.000 1st Qu.:0.2500
Median : 0 Median : 9.000 Median : 8.000 Median :0.5400
Mean : 1195 Mean : 9.345 Mean : 8.306 Mean :0.5182
3rd Qu.: 0 3rd Qu.:12.000 3rd Qu.:11.000 3rd Qu.:0.7900
Max. :179158 Max. :54.000 Max. :42.000 Max. :2.2300
TotalOpenRevolvingAccounts InstallmentBalance RealEstateBalance
Min. : 0.000 Min. : 0 Min. : 0
1st Qu.: 3.000 1st Qu.: 3338 1st Qu.: 0
Median : 6.000 Median : 14453 Median : 26154
Mean : 6.441 Mean : 24900 Mean : 109306
3rd Qu.: 9.000 3rd Qu.: 32238 3rd Qu.: 176542
Max. :44.000 Max. :739371 Max. :1938421
NA's :328
RevolvingBalance RealEstatePayment RevolvingAvailablePercent TotalInquiries
Min. : 0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 2799 1st Qu.: 0.0 1st Qu.: 29.00 1st Qu.: 2.00
Median : 8784 Median : 346.5 Median : 52.00 Median : 3.00
Mean : 19555 Mean : 830.5 Mean : 51.46 Mean : 3.91
3rd Qu.: 21110 3rd Qu.: 1382.2 3rd Qu.: 75.00 3rd Qu.: 5.00
Max. :695648 Max. :13651.0 Max. :100.00 Max. :36.00
TotalTradeItems SatisfactoryAccounts NowDelinquentDerog WasDelinquentDerog
Min. : 2.0 Min. : 1.00 Min. : 0.0000 Min. : 0.000
1st Qu.: 16.0 1st Qu.: 14.00 1st Qu.: 0.0000 1st Qu.: 0.000
Median : 24.0 Median : 21.00 Median : 0.0000 Median : 1.000
Mean : 26.1 Mean : 23.34 Mean : 0.4119 Mean : 2.343
3rd Qu.: 34.0 3rd Qu.: 30.25 3rd Qu.: 0.0000 3rd Qu.: 3.000
Max. :115.0 Max. :113.00 Max. :21.0000 Max. :32.000
OldestTradeOpenDate DelinquenciesOver30Days DelinquenciesOver60Days
Min. : 1011957 Min. : 0.000 Min. : 0.000
1st Qu.: 4101996 1st Qu.: 0.000 1st Qu.: 0.000
Median : 7191993 Median : 1.000 Median : 0.000
Mean : 6934230 Mean : 4.332 Mean : 1.908
3rd Qu.:10011990 3rd Qu.: 5.000 3rd Qu.: 2.000
Max. :12312004 Max. :99.000 Max. :73.000
DelinquenciesOver90Days IsHomeowner LoanStatus
Min. : 0.000 Mode :logical 1:1847
1st Qu.: 0.000 FALSE:4264 2:1262
Median : 0.000 TRUE :4316 3: 256
Mean : 4.009 NA's :0 4:5215
3rd Qu.: 3.000
Max. :99.000
try(na.fail(credit))
glmFit <- train(LoanStatus~., credit, method = "glm", family=binomial, preProcess=c("pca"),
trControl = trainControl(method = "cv"))
Error in contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
logregFit <- train(LoanStatus~., credit, method = "logreg", family=binomial, preProcess=c("pca"),
trControl = trainControl(method = "cv"))
Error in contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
Looking at the error message and your dataset's variables, the variable BorrowerMetropolitanArea
has only one level (Actually it has no predictive value at all if all samples have the same value). I guess this is causing the problem in the contrasts
function when you use PCA to pre-process the dataset.
Try calling the train
funcion on the dataset without the the variable BorrowerMetropolitanArea
.