Karthik Yearning Deep Learning

Feed Forward Propagation from Andrew Ng Machine Learning Lecture

Neural networks are a series of stacked layers. Deeper the network, higher the number of layers. Layer 1 is called the input layer and Layer 3 is called the Output layer. The intermediate layer are called the hidden layer. The number of hidden layer might vary depending on the network complexity. In this case, Layer 2 is the hidden layer. \(x = \begin{bmatrix} x1 \\ x2 \\ x3 \\ \end{bmatrix}\)

\[\theta = \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \\ \end{bmatrix}\]

\(a = g( \theta^T * x ) \\\) g is the activation function . x is the input vector . a is the activation output.

\[a^{(2)}_1 = g( \theta^{(1)}_{10} x_0 + \theta^{(1)}_{11} x_1 + \theta^{(1)}_{12} x_2 + \theta^{(1)}_{13} x_3)\] \[a^{(2)}_2 = g( \theta^{(1)}_{20} x_0 + \theta^{(1)}_{21} x_1 + \theta^{(1)}_{22} x_2 + \theta^{(1)}_{23} x_3)\] \[a^{(2)}_3 = g( \theta^{(1)}_{30} x_0 + \theta^{(1)}_{31} x_1 + \theta^{(1)}_{32} x_2 + \theta^{(1)}_{33} x_3)\]

Dimension of weight matrix

S be the units in Layer 1 .

S = 3 ( Discarding Bias Unit )

T be the units in Layer 2.

T = 3 ( Discarding the Bias activation Unit )

The Dimension is given by ( S * T+1 ).

The Dimension of weight matrix is 3 * 4


\(\begin{bmatrix} \theta^{(1)}_{10} & \theta^{(1)}_{11} & \theta^{(1)}_{12} & \theta^{(1)}_{13} \\ \theta^{(1)}_{20} & \theta^{(1)}_{21} & \theta^{(1)}_{22} & \theta^{(1)}_{23} \\ \theta^{(1)}_{30} & \theta^{(1)}_{31} & \theta^{(1)}_{32} & \theta^{(1)}_{33} \\ \end{bmatrix}\)

\[\begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ \end{bmatrix} = \sigma \begin{pmatrix} \begin{bmatrix} \theta^{(1)}_{10} & \theta^{(1)}_{11} & \theta^{(1)}_{12} & \theta^{(1)}_{13} \\ \theta^{(1)}_{20} & \theta^{(1)}_{21} & \theta^{(1)}_{22} & \theta^{(1)}_{23} \\ \theta^{(1)}_{30} & \theta^{(1)}_{31} & \theta^{(1)}_{32} & \theta^{(1)}_{33} \\ \end{bmatrix} * \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ \end{bmatrix} + \begin{bmatrix} b_o \\ b_1 \\ \vdots \\ \end{bmatrix} \end{pmatrix}\]

The hypothesis function is given by,

\[h_{\theta}(x) = g( \theta_{10}^{(2)} a_{(0)}^{(2)} + \theta_{11}^{(2)} a_{(1)}^{(2)} + \theta_{12}^{(2)} a_{(2)}^{(2)} + \theta_{13}^{(2)} a_{(3)}^{(2)} )\]

We replace

\[z^{(2)}_1 = \theta^{(1)}_{10} x_0 + \theta^{(1)}_{11} x_1 + \theta^{(1)}_{12} x_2 + \theta^{(1)}_{13} x_3\]

The final Equation,

\[a_1^{(2)} = g(z_1^{(2)})\] \[a_2^{(2)} = g(z_2^{(2)})\] \[a_3^{(2)} = g(z_3^{(2)})\]
comments powered by Disqus