I’m trying to program the logistic regression with stochastic descending gradient in R. For example I have followed the example of Andrew Ng named: “ex2data1.txt”.
The point is that the algorithm works properly, but thetas estimation is not exactly what I expected. So I tried to change whole algorithm in order to solve this issue. However, it was almost impossible to me. I was not able to detect the error which is causing this problem. Thus, It would be very useful if someone could check the example and tell me why thetas are not being calculated correctly. I really appreciate it.
Regarding the programming, I'm not using any function implemented in R or matrix calculation. I’m just using sums and subtractions in loops due to I want to use the code in hadoop and I cannot use matrix calculus or even functions which are already programed in R such as "sum", "sqrt", etc
Stochastic Gradient Descent is:
Loop {
for i = 1 to m, {
θj := θj + α(y(i) - hθ(x(i)))(xj)(i)
}
}`
And Logistic regression:
My code is:
data1 <- read.table("~/ex2data1.txt", sep = ",")
names(data1) <- c("Exam1", "Exam2", "Admit")
# Sample the data for stochastic gradient decent
ss<-data1[sample(nrow(data1),size=nrow(data1),replace=FALSE),]
x <- with(ss, matrix(cbind(1, Exam1), nrow = nrow(ss)))
y <- c(ss$Admit)
m <- nrow(x)
# startup parameters
iterations<-1
j<-vector()
alpha<-0.05
theta<-c(0,0)
#My loop
while(iterations<=10){
coste<-c(0,0)
suma<-0
for(i in 1:m){
# h<-1/(1+exp(-Q*x)
h<-1/(1+exp((-theta)*x[i,]))
#Cost(hQ(x),y)=y(i)*log(hQ(x))+(1-y(i))*log(1-hQ(x))
cost<-((y[i]*log(h))+((1-y[i])*log(1-h)))
#sum(cost) i=1 to m
suma<-suma+cost
#Diferences=(hQ(x(i))-y(i))*x(i)
difference<-(h-y[i])*x[i,]
#sum the differences
coste<-coste+difference
#calculation thetas and upgrade = Qj:= Qj - alpha* sum((h-y[i])*x[i,]*x(i))
theta[1]<-(theta[1]-alpha*1/m*(coste[1]))
theta[2]<-(theta[2]-alpha*1/m*(coste[2]))
}
#J(Q)=(-1/m)* sum ( y(i)*log(hQ(x))+(1-y(i))*log(1-hQ(x)))
j[iterations]<-(-1/m)*suma
iterations=iterations+1
}
#If I compare my thetas with R glm
Call: glm(formula = y ~ x[, 2], family = binomial("logit"), data = data1)
Coefficients:
Intercept:-4.71816
x[, 2] :0.08091
My thetas
Intercept: 0.4624024
x[,2]: 1.3650234
I have implemented a solution in R for the other Ng's example set: ex2data2.txt. Here is my code:
sigmoid <- function(z) {
return(1/(1 + exp(-z)))
}
mapFeature <- function(X1, X2) {
degree <- 6
out <- rep(1, length(X1))
for (i in 1:degree) {
for (j in 0:i) {
out <- cbind(out, (X1^(i - j)) * (X2^j))
}
}
return(out)
}
## Cost Function
fr <- function(theta, X, y, lambda) {
m <- length(y)
return(1/m * sum(-y * log(sigmoid(X %*% theta)) - (1 - y) *
log(1 - sigmoid(X %*% theta))) + lambda/2/m * sum(theta[-1]^2))
}
## Gradient
grr <- function(theta, X, y, lambda) {
return(1/m * t(X) %*% (sigmoid(X %*% theta) - y) + lambda/m *
c(0, theta[-1]))
}
data <- read.csv("ex2data2.txt", header = F)
X = as.matrix(data[,c(1,2)])
y = data[,3]
X = mapFeature(X[,1],X[,2])
m <- nrow(X)
n <- ncol(X)
initial_theta = rep(0, n)
lambda <- 1
res <- optim(initial_theta, fr, grr, X, y, lambda,
method = "BFGS", control = list(maxit = 100000))