How to convert deep learning gradient descent equation into python

Eranga Atugoda picture Eranga Atugoda · Aug 23, 2017 · Viewed 7.5k times · Source

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please

Please see the following link for the equations used Click here to see the equations used for the calculations

Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations

import numpy as np

def propagate(w, b, X, Y):
"""
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size
  (1, number of examples)

Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b

Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""

m = X.shape[1]


# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A =                                      # compute activation
cost =                                   # compute cost
### END CODE HERE ###


# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw = 
db = 
### END CODE HERE ###


assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())

grads = {"dw": dw,
         "db": db}

return grads, cost

Following are the data given to test the above function

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), 
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

Following is the expected output of the above

Expected Output:
dw  [[ 0.99993216] [ 1.99980262]]
db  0.499935230625
cost    6.000064773192205

For the above propagate function I have used the below replacements, but the output is not what is expected. Please kindly help on how to get the expected output

A = sigmoid(X)
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
dw = (np.dot(X,((A-Y).T)))/m
db = np.sum((A-Y),axis=0)/m

Following is the sigmoid function used to calculate the Activation:

def sigmoid(z):
  """
  Compute the sigmoid of z

  Arguments:
  z -- A scalar or numpy array of any size.

  Return:
  s -- sigmoid(z)
  """

  ### START CODE HERE ### (≈ 1 line of code)
  s = 1 / (1+np.exp(-z))
  ### END CODE HERE ###

return s

Hope someone could help me understand how to solve this as I couldn't continue with rest of the tutorials without understanding this. Many thanks

Answer

Besher picture Besher · Nov 1, 2018

You can calculate A,cost,dw,db as the following:

A = sigmoid(np.dot(w.T,X) + b)     
cost = -1 / m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A)) 

dw = 1/m * np.dot(X,(A-Y).T)
db = 1/m * np.sum(A-Y)

where sigmoid is :

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))    
    return s