Deep Learning with TensorFlow 2.0. A quick introduction to the new TensorFlow 2.0 way of doing Deep Learning using Keras. You’ll walk away with a better understanding of how you can get started building machine learning models in Python with TensorFlow 2.0 as well as the other exciting available features!

So, Let’s begin 😈…

Linear AlgebraLinear algebra is the branch of mathematics concerning linear equations and linear functions and their representations through matrices and vector spaces.

Machine Learning relies heavily on Linear Algebra, so it is essential to understand what vectors and matrices are, what operations you can perform with them, and how they can be useful.

If you are already familiar with linear algebra, feel free to skip this chapter but note that the implementation of certain functions are different between Tensorflow 1.0 and Tensorflow 2.0 so you should atleast skim through the code.

If you have had no exposure at all to linear algebra, this chapter will teach you enough to read this book.

```
# Installs
!pip install -q tensorflow==2.0.0-alpha0
# Imports
import tensorflow as tf
import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
"""
If you are running this notebook in Google colab, make sure to upload the helpers.py file to your
session before running it, but if you are running this in Binder, then you
don't have to worry about it. The helpers.py file will be in the notebook
folder in GitHub.
"""
from helpers import vector_plot, plot_transform
```

Scalars, Vectors, Matrices and Tensors
**Scalars:** are just a single number. For example temperature, which is denoted by just one number.

**Vectors:** are an array of numbers. The numbers are arranged in order and we can identify each individual number by its index in that ordering. We can think of vectors as identifying points in space, with each element giving the coordinate along a different axis. In simple terms, a vector is an arrow representing a quantity that has both magnitude and direction wherein the length of the arrow represents the magnitude and the orientation tells you the direction. For example wind, which has a direction and magnitude.

**Matrices:** A matrix is a 2D-array of numbers, so each element is identified by two indices instead of just one. If a real valued matrix A has a height of *m* and a width of *n*, then we say that A∈Rm×n. We identify the elements of the matrix as Am,n where *m* represents the row and *n* represents the column.

**Tensors:** In the general case, are an array of numbers arranged on a regular grid with a variable number of axes is knows as a tensor. We identify the elements of a tensor A at coordinates(*i, j, k*) by writing Ai,j,k. But to truly understand tensors, we need to expand the way we think of vectors as only arrows with a magnitude and direction. Remember that a vector can be represented by three components, namely the x, y and z components (basis vectors). If you have a pen and a paper, let’s do a small experiment, place the pen vertically on the paper and slant it by some angle and now shine a light from top such that the shadow of the pen falls on the paper, this shadow, represents the x component of the vector “pen” and the height from the paper to the tip of the pen is the y component. Now, let’s take these components to describe tensors, imagine, you are Indiana Jones or a treasure hunter and you are trapped in a cube and there are three arrows flying towards you from the three faces (to represent x, y, z axis) of the cube 😬, I know this will be the last thing you would think in such a situation but you can think of those three arrows as vectors pointing towards you from the three faces of the cube and you can represent those vectors (arrows) in x, y and z components, now that is a rank 2 tensor (matrix) with 9 components. Remember that this is a very very simple explanation of tensors. Following is a representation of a tensor:

We can add matrices to each other as long as they have the same shape, just by adding their corresponding elements:

In tensorflow a:

- Rank 0 Tensor is a Scalar
- Rank 1 Tensor is a Vector
- Rank 2 Tensor is a Matrix
- Rank 3 Tensor is a 3-Tensor
- Rank n Tensor is a n-Tensor

```
# let's create a ones 3x3 rank 2 tensor
rank_2_tensor_A = tf.ones([3, 3], name='MatrixA')
print("3x3 Rank 2 Tensor A: \n{}\n".format(rank_2_tensor_A))
# let's manually create a 3x3 rank two tensor and specify the data type as float
rank_2_tensor_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name='MatrixB', dtype=tf.float32)
print("3x3 Rank 2 Tensor B: \n{}\n".format(rank_2_tensor_B))
# addition of the two tensors
rank_2_tensor_C = tf.add(rank_2_tensor_A, rank_2_tensor_B, name='MatrixC')
print("Rank 2 Tensor C with shape={} and elements: \n{}".format(rank_2_tensor_C.shape, rank_2_tensor_C))
```

```
# Let's see what happens if the shapes are not the same
two_by_three = tf.ones([2, 3])
try:
incompatible_tensor = tf.add(two_by_three, rank_2_tensor_B)
except:
print("""Incompatible shapes to add with two_by_three of shape {0} and 3x3 Rank 2 Tensor B of shape {1}
""".format(two_by_three.shape, rank_2_tensor_B.shape))
```

We can also add a scalar to a matrix or multiply a matrix by a scalar, just by performing that operation on each element of a matrix:

```
# Create scalar a, c and Matrix B
rank_0_tensor_a = tf.constant(2, name="scalar_a", dtype=tf.float32)
rank_2_tensor_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name='MatrixB', dtype=tf.float32)
rank_0_tensor_c = tf.constant(3, name="scalar_c", dtype=tf.float32)
# multiplying aB
multiply_scalar = tf.multiply(rank_0_tensor_a, rank_2_tensor_B)
# adding aB + c
rank_2_tensor_D = tf.add(multiply_scalar, rank_0_tensor_c, name="MatrixD")
print("""Original Rank 2 Tensor B: \n{0} \n\nScalar a: {1}
Rank 2 Tensor for aB: \n{2} \n\nScalar c: {3} \nRank 2 Tensor D = aB + c: \n{4}
""".format(rank_2_tensor_B, rank_0_tensor_a, multiply_scalar, rank_0_tensor_c, rank_2_tensor_D))
```

One important operation on matrices is the **transpose**. The transpose of a matrix is the mirror image of the martrix across a diagonal line, called the **main diagonal**. We denote the transpose of a matrix A as A⊤ and is defined as such: (A⊤)i,j=Aj,i

```
# Creating a Matrix E
rank_2_tensor_E = tf.constant([[1, 2, 3], [4, 5, 6]])
# Transposing Matrix E
transpose_E = tf.transpose(rank_2_tensor_E, name="transposeE")
print("""Rank 2 Tensor E of shape: {0} and elements: \n{1}\n
Transpose of Rank 2 Tensor E of shape: {2} and elements: \n{3}""".format(rank_2_tensor_E.shape, rank_2_tensor_E, transpose_E.shape, transpose_E))
```

In deep learning we allow the addition of matrix and a vector, yielding another matrix where Ci,j=Ai,j+bj. In other words, the vector b is added to each row of the matrix. This implicit copying of b to many locations is called **broadcasting**

```
# Creating a vector b
rank_1_tensor_b = tf.constant([[4.], [5.], [6.]])
# Broadcasting a vector b to a matrix A such that it yields a matrix F = A + b
rank_2_tensor_F = tf.add(rank_2_tensor_A, rank_1_tensor_b, name="broadcastF")
print("""Rank 2 tensor A: \n{0}\n \nRank 1 Tensor b: \n{1}
\nRank 2 tensor F = A + b:\n{2}""".format(rank_2_tensor_A, rank_1_tensor_b, rank_2_tensor_F))
```

To define the matrix product of matrices A and B, A must have the same number of columns as B. If A is of shape *m x n* and B is of shape *n x p*, then C is of shape *m x p*.

If you do not recall how matrix multiplication is performed, take a look at:

```
# Matrix A and B with shapes (2, 3) and (3, 4)
mmv_matrix_A = tf.ones([2, 3], name="matrix_A")
mmv_matrix_B = tf.constant([[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], name="matrix_B", dtype=tf.float32)
# Matrix Multiplication: C = AB with C shape (2, 4)
matrix_multiply_C = tf.matmul(mmv_matrix_A, mmv_matrix_B, name="matrix_multiply_C")
print("""Matrix A: shape {0} \nelements: \n{1} \n\nMatrix B: shape {2} \nelements: \n{3}
\nMatrix C: shape {4} \nelements: \n{5}""".format(mmv_matrix_A.shape, mmv_matrix_A, mmv_matrix_B.shape, mmv_matrix_B, matrix_multiply_C.shape, matrix_multiply_C))
```

To get a matrix containing the product of the individual elements, we use **element wise product** or **Hadamard product** and is denoted as A⊙B.

```
"""
Note that we use multiply to do element wise matrix multiplication and matmul
to do matrix multiplication
"""
# Creating new Matrix A and B with shapes (3, 3)
element_matrix_A = tf.ones([3, 3], name="element_matrix_A")
element_matrix_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name="element_matrix_B", dtype=tf.float32)
# Element wise multiplication of Matrix A and B
element_wise_C = tf.multiply(element_matrix_A, element_matrix_B, name="element_wise_C")
print("""Matrix A: shape {0} \nelements: \n{1} \n\nMatrix A: shape {2} \nelements: \n{3}\n
Matrix C: shape {4} \nelements: \n{5}""".format(element_matrix_A.shape, element_matrix_A, element_matrix_B.shape, element_matrix_B, element_wise_C.shape, element_wise_C))
```

To compute the dot product between A and B we compute Ci,j as the dot product between row *i* of A and column *j* of B.

```
# Creating Matrix A and B with shapes (3, 3)
dot_matrix_A = tf.ones([3, 3], name="dot_matrix_A")
dot_matrix_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name="dot_matrix_B", dtype=tf.float32)
# Dot product of A and B
dot_product_C = tf.tensordot(dot_matrix_A, dot_matrix_B, axes=1, name="dot_product_C")
print("""Matrix A: shape {0} \nelements: \n{1} \n\nMatrix B: shape {2} \nelements: \n{3}\n
Matrix C: shape {4} \nelements: \n{5}""".format(dot_matrix_A.shape, dot_matrix_A, dot_matrix_B.shape, dot_matrix_B, dot_product_C.shape, dot_product_C))
```

Some properties of matrix multiplication (Distributive Property):

```
# Common Matrices to check all the matrix Properties
matrix_A = tf.constant([[1, 2], [3, 4]], name="matrix_a")
matrix_B = tf.constant([[5, 6], [7, 8]], name="matrix_b")
matrix_C = tf.constant([[9, 1], [2, 3]], name="matrix_c")
```

```
# Distributive Property
print("Matrix A: \n{} \n\nMatrix B: \n{} \n\nMatrix C: \n{}\n".format(matrix_A, matrix_B, matrix_C))
# AB + AC
distributive_RHS = tf.add(tf.matmul(matrix_A, matrix_B), tf.matmul(matrix_A, matrix_C), name="RHS")
# A(B+C)
distributive_LHS = tf.matmul(matrix_A, (tf.add(matrix_B, matrix_C)), name="LHS")
"""
Following is another way a conditional statement can be implemented from tensorflow
This might not seem very useful now but I want to introduce it here so you can
figure out how it works for a simple example.
"""
# To compare each element in the matrix, you need to reduce it first and check if it's equal
predictor = tf.reduce_all(tf.equal(distributive_RHS, distributive_LHS))
# condition to act on if predictor is True
def true_print(): print("""Distributive property is valid
RHS: AB + AC: \n{} \n\nLHS: A(B+C): \n{}""".format(distributive_RHS, distributive_LHS))
# condition to act on if predictor is False
def false_print(): print("""You Broke the Distributive Property of Matrix
RHS: AB + AC: \n{} \n\nis NOT Equal to LHS: A(B+C): \n{}""".format(distributive_RHS, distributive_LHS))
tf.cond(predictor, true_print, false_print)
```

Some properties of matrix multiplication (Associative property):

```
# Associative property
print("Matrix A: \n{} \n\nMatrix B: \n{} \n\nMatrix C: \n{}\n".format(matrix_A, matrix_B, matrix_C))
# (AB)C
associative_RHS = tf.matmul(tf.matmul(matrix_A, matrix_B), matrix_C)
# A(BC)
associative_LHS = tf.matmul(matrix_A, tf.matmul(matrix_B, matrix_C))
# To compare each element in the matrix, you need to reduce it first and check if it's equal
predictor = tf.reduce_all(tf.equal(associative_RHS, associative_LHS))
# condition to act on if predictor is True
def true_print(): print("""Associative property is valid
RHS: (AB)C: \n{} \n\nLHS: A(BC): \n{}""".format(associative_RHS, associative_LHS))
# condition to act on if predictor is False
def false_print(): print("""You Broke the Associative Property of Matrix
RHS: (AB)C: \n{} \n\nLHS: A(BC): \n{}""".format(associative_RHS, associative_LHS))
tf.cond(predictor, true_print, false_print)
```

Some properties of matrix multiplication (Matrix multiplication is not commutative):

```
# Matrix multiplication is not commutative
print("Matrix A: \n{} \n\nMatrix B: \n{}\n".format(matrix_A, matrix_B))
# Matrix A times B
commutative_RHS = tf.matmul(matrix_A, matrix_B)
# Matrix B times A
commutative_LHS = tf.matmul(matrix_B, matrix_A)
predictor = tf.logical_not(tf.reduce_all(tf.equal(commutative_RHS, commutative_LHS)))
def true_print(): print("""Matrix Multiplication is not commutative
RHS: (AB): \n{} \n\nLHS: (BA): \n{}""".format(commutative_RHS, commutative_LHS))
def false_print(): print("""You made Matrix Multiplication commutative
RHS: (AB): \n{} \n\nLHS: (BA): \n{}""".format(commutative_RHS, commutative_LHS))
tf.cond(predictor, true_print, false_print)
```

Some properties of matrix multiplication (Transpose):

```
# Transpose of a matrix
print("Matrix A: \n{} \n\nMatrix B: \n{}\n".format(matrix_A, matrix_B))
# Tensorflow transpose function
transpose_RHS = tf.transpose(tf.matmul(matrix_A, matrix_B))
# If you are doing matrix multiplication tf.matmul has a parameter to take the tranpose and then matrix multiply
transpose_LHS = tf.matmul(matrix_B, matrix_A, transpose_a=True, transpose_b=True)
predictor = tf.reduce_all(tf.equal(transpose_RHS, transpose_LHS))
def true_print(): print("""Transpose property is valid
RHS: (AB):^T \n{} \n\nLHS: (B^T A^T): \n{}""".format(transpose_RHS, transpose_LHS))
def false_print(): print("""You Broke the Transpose Property of Matrix
RHS: (AB):^T \n{} \n\nLHS: (B^T A^T): \n{}""".format(transpose_RHS, transpose_LHS))
tf.cond(predictor, true_print, false_print)
```

Linear algebra offers a powerful tool called **matrix inversion** that enables us to analytically solve Ax=b for many values of A To describe matrix inversion, we first need to define the concept of an **identity matrix**. An identity matrix is a matrix that does not change any vector when we multiply that vector by that matrix.

Such that:

The structure of the identity matrix is simple: all the entries along the main diagonal are 1, while all the other entries are zero.

```
# let's create a identity matrix I
identity_matrix_I = tf.eye(3, 3, dtype=tf.float32, name='IdentityMatrixI')
print("Identity matrix I: \n{}\n".format(identity_matrix_I))
# let's create a 3x1 vector x
iim_vector_x = tf.constant([[4], [5], [6]], name='Vector_x', dtype=tf.float32)
print("Vector x: \n{}\n".format(iim_vector_x))
# Ix will result in x
iim_matrix_C = tf.matmul(identity_matrix_I, iim_vector_x, name='MatrixC')
print("Matrix C from Ix: \n{}".format(iim_matrix_C))
```

The **matrix inverse** of A is denoted as A−1, and it is defined as the matrix such that:

```
iim_matrix_A = tf.constant([[2, 3], [2, 2]], name='MatrixA', dtype=tf.float32)
try:
# Tensorflow function to take the inverse
inverse_matrix_A = tf.linalg.inv(iim_matrix_A)
# Creating a identity matrix using tf.eye
identity_matrix = tf.eye(2, 2, dtype=tf.float32, name="identity")
iim_RHS = identity_matrix
iim_LHS = tf.matmul(inverse_matrix_A, iim_matrix_A, name="LHS")
predictor = tf.reduce_all(tf.equal(iim_RHS, iim_LHS))
def true_print(): print("""A^-1 times A equals the Identity Matrix
Matrix A: \n{0} \n\nInverse of Matrix A: \n{1} \n\nRHS: I: \n{2} \n\nLHS: A^(-1) A: \n{3}""".format(iim_matrix_A, inverse_matrix_A, iim_RHS, iim_LHS))
def false_print(): print("Condition Failed")
tf.cond(predictor, true_print, false_print)
except:
print("""A^-1 doesnt exist
Matrix A: \n{} \n\nInverse of Matrix A: \n{} \n\nRHS: I: \n{}
\nLHS: (A^(-1) A): \n{}""".format(iim_matrix_A, inverse_matrix_A, iim_RHS, iim_LHS))
```

If you try different values for Matrix A, you will see that, not all A has an inverse and we will discuss the conditions for the existence of A−1 in the following section.

We can then solve the equation Ax=b as:

This process depends on it being possible to find A−1.

We can calculate the inverse of a matrix by:

Lets see how we can solve a simple linear equation: 2x + 3y = 6 and 4x + 9y = 15

```
# The above system of equation can be written in the matrix format as:
sys_matrix_A = tf.constant([[2, 3], [4, 9]], dtype=tf.float32)
sys_vector_B = tf.constant([[6], [15]], dtype=tf.float32)
print("Matrix A: \n{} \n\nVector B: \n{}\n".format(sys_matrix_A, sys_vector_B))
# now to solve for x: x = A^(-1)b
sys_x = tf.matmul(tf.linalg.inv(sys_matrix_A), sys_vector_B)
print("Vector x is: \n{} \nWhere x = {} and y = {}".format(sys_x, sys_x[0], sys_x[1]))
```

Linear Dependence and Span
For A−1 to exits, Ax=b must have exactly one solution for every value of b. It is also possible for the system of equations to have no solutions or infinitely many solutions for some values of b. This is simply because we are dealing with linear systems and two lines can’t cross more than once. So, they can either cross once, cross never, or have infinite crossing, meaning the two lines are superimposed.

Hence if both x and y are solutions then:

z=αx+(1−α)y is also a solution for any real α

The span of a set of vectors is the set of all linear combinations of the vectors. Formally, a **linear combination** of some set of vectors v1,⋯,vn is given by multiplying each vecor v(i) by a corresponding scalar coefficient and adding the results:

Determining whether Ax=b has a solution thus amounts to testing whether b is in the span of the columns of A. This particular span is known as the **column space** or the **range**, of A .

In order for the system Ax=b to have a solution for all values of b∈Rm, we require that the column space of A be all of Rm.

A set of vectors v1,⋯,vn is **linearly independent** if the only solution to the vector equation λ1v1+⋯λnvn=0 is λi=0 ∀i. If a set of vectors is not linearly independent, then it is **linearly dependent**.

For the matrix to have an inverse, the matrix must be **square**, that is, we require that *m = n* and that all the columns be linearly independent. A square matrix with linearly dependent columns is known as **singular**.

If A is not square or is square but singular, solving the equation is still possible, but we cannot use the method of matrix inversion to find the solution.

So far we have discussed matrix inverses as being multiplied on the left. It is also possible to define an inverse that is multiplied on the right. For square matrixes, the left inverse and right inverse are equal.

```
# Lets start by finding for some value of A and x, what the result of x is
lds_matrix_A = tf.constant([[3, 1], [1, 2]], name='MatrixA', dtype=tf.float32)
lds_vector_x = tf.constant([2, 3], name='vectorX', dtype=tf.float32)
lds_b = tf.multiply(lds_matrix_A, lds_vector_x, name="b")
# Now let's see if an inverse for Matrix A exists
try:
inverse_A = tf.linalg.inv(lds_matrix_A)
print("Matrix A is successfully inverted: \n{}".format(inverse_A))
except:
print("Inverse of Matrix A: \n{} \ndoesn't exist. ".format(lds_matrix_A))
# Let's find the value of x using x = A^(-1)b
verify_x = tf.matmul(inverse_A, lds_b, name="verifyX")
predictor = tf.equal(lds_vector_x[0], verify_x[0][0]) and tf.equal(lds_vector_x[1], verify_x[1][1])
def true_print(): print("""\nThe two x values match, we proved that if a matrix A is invertible
Then x = A^T b, \nwhere x: {}, \n\nA^T: \n{}, \n\nb: \n{}""".format(lds_vector_x, inverse_A, lds_b))
def false_print(): print("""\nThe two x values don't match.
Vector x: {} \n\nA^(-1)b: \n{}""".format(lds_vector_x, verify_x))
tf.cond(predictor, true_print, false_print)
```

Note that, finding inverses can be a challenging process if you want to calculate it, but using tensorflow or any other library, you can easily check if the inverse of the matrix exists. If you know the conditions and know how to solve matrix equations using tensorflow, you should be good, but for the reader who wants to go deeper, check Linear Dependence and Span for further examples and definitions.

In machine learning if we need to measure the size of vectors, we use a function called a **norm**. And norm is what is generally used to evaluate the error of a model. Formally, the LP norm is given by:

for p∈R,p≥1On an intuitive level, the norm of a vector x measures the distance from the origin to the point x

More rigorously, a norm is any function f that satisfies the following properties:

The L2 norm with *p=2* is known as the **Euclidean norm**. Which is simply the Euclidean distance from the origin to the point identified by x. It is also common to measure the size of a vector using the squared L2 norm, which can be calculated simply as x⊤x

```
# Euclidean distance between square root(3^2 + 4^2) calculated by setting ord='euclidean'
dist_euclidean = tf.norm([3., 4.], ord='euclidean')
print("Euclidean Distance: {}".format(dist_euclidean))
# Size of the vector [3., 4.]
vector_size = tf.multiply(tf.transpose([3., 4.]), [3., 4.])
print("Vector Size: {}".format(vector_size))
```

In many contexts, the squared L2 norm may be undesirable, because it increases very slowly near the origin. In many machine learning applications, it is important to discriminate between elements that are exactly zero and elements that are small but nonzero. In these cases, we turn to a function that grows at the same rate in all locations, but retains mathematical simplicity: the L1 norm, which can be simplified to:

```
def SE(x,y,intc,beta):
return (1./len(x))*(0.5)*sum(y - beta * x - intc)**2
def L1(intc,beta,lam):
return lam*(tf.abs(intc) + tf.abs(beta))
def L2(intc,beta,lam):
return lam*(intc**2 + beta**2)
N = 100
x = np.random.randn(N)
y = 2 * x + np.random.randn(N)
beta_N = 100
beta = tf.linspace(-100., 100., beta_N)
intc = 0.0
SE_array = np.array([SE(x,y,intc,i) for i in beta])
L1_array = np.array([L1(intc,i,lam=30) for i in beta])
L2_array = np.array([L2(intc,i,lam=1) for i in beta])
fig1 = plt.figure()
ax1 = fig1.add_subplot(1,1,1)
ax1.plot(beta, SE_array, label='Squared L2 Norm')
ax1.plot(beta, L1_array, label='L1 norm')
ax1.plot(beta, L2_array, label='L2 norm')
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.title('The graph of each of the norms', color='w')
plt.legend()
fig1.show()
```

One other norm that commonly arises in machine learning is the L∞ norm, also known as the **max norm**. This norm simplifies to the absolute value of the element with the largest magnitude in the vector,

If we wish to measure the size of a matrix, in context of deep learning, the most common way to do this is with the **Frobenius norm**:

```
n_matrix_A = tf.constant([[2, -1, 5], [0, 2, 1], [3, 1, 1]], name="matrix_A", dtype=tf.float32)
# Frobenius norm for matrix calculated by setting ord='fro'
frobenius_norm = tf.norm(n_matrix_A, ord='fro', axis=(0, 1))
print("Frobenius norm: {}".format(frobenius_norm))
```

The dot product of two vectors can be rewritten in terms of norms as:

where θ is the angle between x and y.

```
# for x(0, 2) and y(2, 2) cos theta = 45 degrees
n_vector_x = tf.constant([[0], [2]], dtype=tf.float32, name="vectorX")
n_vector_y = tf.constant([[2], [2]], dtype=tf.float32, name="vectorY")
# Due to pi being in, we won't get an exact value so we are rounding our final value
prod_RHS = tf.round(tf.multiply(tf.multiply(tf.norm(n_vector_x), tf.norm(n_vector_y)), tf.cos(np.pi/4)))
prod_LHS = tf.tensordot(tf.transpose(n_vector_x), n_vector_y, axes=1, name="LHS")
predictor = tf.equal(prod_RHS, prod_LHS)
def true_print(): print("""Dot Product can be rewritten in terms of norms, where \n
RHS: {} \nLHS: {}""".format(prod_RHS, prod_LHS))
def false_print(): print("""Dot Product can not be rewritten in terms of norms, where \n
RHS: {} \nLHS: {}""".format(prod_RHS, prod_LHS))
tf.cond(predictor, true_print, false_print)
origin=[0,0]
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.xlim(-2, 10)
plt.ylim(-1, 10)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)
plt.text(-1, 2, r'$\vec{x}$', size=18)
plt.text(2, 1.5, r'$\vec{y}$', size=18)
plt.quiver(*origin, n_vector_x, n_vector_y, color=['#FF9A13','#1190FF'], scale=8)
plt.show()
```

Special Kinds of Matrices and Vectors
**Diagonal** matrices consist mostly of zeros and have nonzero entries only along the main diagonal. Identity matrix is an example of diagonal matrix. We write diag(v) to denote a square diagonal matrix whose diagonal entries are given by the entries of the vector *v*. To compute diag(v)x we only need to scale each element xi by vi. In other words:

```
# create vector v and x
sp_vector_v = tf.random.uniform([5], minval=0, maxval=10, dtype = tf.int32, seed = 0, name="vector_v")
sp_vector_x = tf.random.uniform([5], minval=0, maxval=10, dtype = tf.int32, seed = 0, name="vector_x")
print("Vector v: {} \nVector x: {}\n".format(sp_vector_v, sp_vector_x))
# RHS diagonal vector v dot diagonal vector x. The linalg.diag converts a vector to a diagonal matrix
sp_RHS = tf.tensordot(tf.linalg.diag(sp_vector_v), tf.linalg.diag(sp_vector_x), axes=1)
# LHS diag(v)x
sp_LHS = tf.multiply(tf.linalg.diag(sp_vector_v), sp_vector_x)
predictor = tf.reduce_all(tf.equal(sp_RHS, sp_LHS))
def true_print(): print("Diagonal of v times x: \n{} \n\nis equal to vector v dot vector x: \n{}".format(sp_RHS, sp_LHS))
def false_print(): print("Diagonal of v times x: \n{} \n\nis NOT equal to vector v dot vector x: \n{}".format(sp_RHS, sp_LHS))
tf.cond(predictor, true_print, false_print)
```

Inverting a square diagonal matrix is also efficient. The inverse exists only if every diagonal entry is nonzero, and in that case:

```
try:
# try creating a vector_v with zero elements and see what happens
d_vector_v = tf.random.uniform([5], minval=1, maxval=10, dtype = tf.float32, seed = 0, name="vector_v")
print("Vector v: {}".format(d_vector_v))
# linalg.diag converts a vector to a diagonal matrix
diag_RHS = tf.linalg.diag(tf.transpose(1. / d_vector_v))
# we convert the vector to diagonal matrix and take it's inverse
inv_LHS = tf.linalg.inv(tf.linalg.diag(d_vector_v))
predictor = tf.reduce_all(tf.equal(diag_RHS, inv_LHS))
def true_print(): print("The inverse of LHS: \n{} \n\nMatch the inverse of RHS: \n{}".format(diag_RHS, inv_LHS))
def false_print(): print("The inverse of LHS: \n{} \n\n Does not match the inverse of RHS: \n{}".format(diag_RHS, inv_LHS))
tf.cond(predictor, true_print, false_print)
except:
print("The inverse exists only if every diagonal is nonzero, your vector looks: \n{}".format(d_vector_v))
```

Not all diagonal matrices need be square. It is possible to construct a rectangular diagonal matrix. Nonsquare diagonal matrices do not have inverses, but we can still multiply by them cheaply. For a nonsquare diagonal matrix D, the product Dx will involve scaling each element of x and either concatenating some zeros to the result, if D is taller than it is wide, or discarding some of the last elements of the vector, if D is wider than it is tall.

A **symmetric** matrix is any matrix that is equal to its own transpose: A=A⊤

Symmetric matrices often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments. For example, if A is a matrix of distance measurements, with Ai,j giving the distance from point *i* to point *j*, then Ai,j=Aj,i because distance functions are symmetric.

```
# create a symmetric matrix
sp_matrix_A = tf.constant([[0, 1, 3], [1, 2, 4], [3, 4, 5]], name="matrix_a", dtype=tf.int32)
# get the transpose of matrix A
sp_transpose_a = tf.transpose(sp_matrix_A)
predictor = tf.reduce_all(tf.equal(sp_matrix_A, sp_transpose_a))
def true_print(): print("Matrix A: \n{} \n\nMatches the the transpose of Matrix A: \n{}".format(sp_matrix_A, sp_transpose_a))
def false_print(): print("Matrix A: \n{} \n\nDoes Not match the the transpose of Matrix A: \n{}".format(sp_matrix_A, sp_transpose_a))
tf.cond(predictor, true_print, false_print)
```

A vector x and a vector y are ** orthogonal** to each other if x⊤y=0. If both vectors have nonzero norm, this means that they are at a 90 degree angle to each other.

```
# Lets create two vectors
ortho_vector_x = tf.constant([2, 2], dtype=tf.float32, name="vector_x")
ortho_vector_y = tf.constant([2, -2], dtype=tf.float32, name="vector_y")
print("Vector x: {} \nVector y: {}\n".format(ortho_vector_x, ortho_vector_y))
# lets verify if x transpose dot y is zero
ortho_LHS = tf.tensordot(tf.transpose(ortho_vector_x), ortho_vector_y, axes=1)
print("X transpose times y = {}\n".format(ortho_LHS))
# let's see what their norms are
ortho_norm_x = tf.norm(ortho_vector_x)
ortho_norm_y = tf.norm(ortho_vector_y)
print("Norm x: {} \nNorm y: {}\n".format(ortho_norm_x, ortho_norm_y))
# If they have non zero norm, let's see what angle they are to each other
if tf.logical_and(ortho_norm_x > 0, ortho_norm_y > 0):
# from the equation cos theta = (x dot y)/(norm of x times norm y)
cosine_angle = (tf.divide(tf.tensordot(ortho_vector_x, ortho_vector_y, axes=1), tf.multiply(ortho_norm_x, ortho_norm_y)))
print("Angle between vector x and vector y is: {} degrees".format(tf.acos(cosine_angle) * 180 /np.pi))
origin=[0,0]
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.xlim(-1, 10)
plt.ylim(-10, 10)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)
plt.text(1, 4, r'$\vec{x}$', size=18)
plt.text(1, -6, r'$\vec{y}$', size=18)
plt.quiver(*origin, ortho_vector_x, ortho_vector_y, color=['#FF9A13','#1190FF'], scale=8)
plt.show()
```

A **unit vector** is a vector with a **unit norm**: ∥x∥2=1.

If two vectors are not only are orthogonal but also have unit norm, we call them **orthonormal**.

A **orthogonal matrix** is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal:

which implies A−1=A⊤so orthogonal matrices are of interest because their inverse is very cheap to compute.

```
# Lets use sine and cosine to create orthogonal matrix
ortho_matrix_A = tf.Variable([[tf.cos(.5), -tf.sin(.5)], [tf.sin(.5), tf.cos(.5)]], name="matrixA")
print("Matrix A: \n{}\n".format(ortho_matrix_A))
# extract columns from the matrix to verify if they are orthogonal
col_0 = tf.reshape(ortho_matrix_A[:, 0], [2, 1])
col_1 = tf.reshape(ortho_matrix_A[:, 1], [2, 1])
row_0 = tf.reshape(ortho_matrix_A[0, :], [2, 1])
row_1 = tf.reshape(ortho_matrix_A[1, :], [2, 1])
# Verifying if the columns are orthogonal
ortho_column = tf.tensordot(tf.transpose(col_0), col_1, axes=2)
print("Columns are orthogonal: {}".format(ortho_column))
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
origin = [0, 0]
plt.xlim(-2, 2)
plt.ylim(-3, 4)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)
plt.text(0, 2, r'$\vec{col_0}$', size=18)
plt.text(0.5, -2, r'$\vec{col_1}$', size=18)
plt.quiver(*origin, col_0, col_1, color=['#FF9A13','#FF9A13'], scale=3)
# Verifying if the rows are orthogonal
ortho_row = tf.tensordot(tf.transpose(row_0), row_1, axes=2)
print("Rows are orthogonal: {}\n".format(ortho_row))
plt.text(-1, 2, r'$\vec{row_0}$', size=18)
plt.text(1, 0.5, r'$\vec{row_1}$', size=18)
plt.quiver(*origin, row_0, row_1, color=['r','r'], scale=3)
plt.show()
# inverse of matrix A
ortho_inverse_A = tf.linalg.inv(ortho_matrix_A)
# Transpose of matrix A
ortho_transpose_A = tf.transpose(ortho_matrix_A)
predictor = tf.reduce_all(tf.equal(ortho_inverse_A, ortho_transpose_A))
def true_print(): print("Inverse of Matrix A: \n{} \n\nEquals the transpose of Matrix A: \n{}".format(ortho_inverse_A, ortho_transpose_A))
def false_print(): print("Inverse of Matrix A: \n{} \n\nDoes not equal the transpose of Matrix A: \n{}".format(ortho_inverse_A, ortho_transpose_A))
tf.cond(predictor, true_print, false_print)
```

We can represent a number, for example 12 as 12 = 2 x 2 x 3. The representation will change depending on whether we write it in base ten or in binary but the above representation will always be true and from that we can conclude that 12 is not divisible by 5 and that any integer multiple of 12 will be divisible by 3.

Similarly, we can also decompose matrices in ways that show us information about their functional properties that is not obvious from the representation of the matrix as an array of elements. One of the most widely used kinds of matrix decomposition is called **eigen decomposition**, in which we decompose a matrix into a set of eigenvectors and eigenvalues.

An **eigenvector** of a square matrix A is a nonzero vector v such that multiplication by A alters only the scale of v, in short this is a special vector that doesn’t change the direction of the matrix when applied to it :

The scale λ is known as the **eigenvalue** corresponding to this eigenvector.

```
# Let's see how we can compute the eigen vectors and values from a matrix
e_matrix_A = tf.random.uniform([2, 2], minval=3, maxval=10, dtype=tf.float32, name="matrixA")
print("Matrix A: \n{}\n\n".format(e_matrix_A))
# Calculating the eigen values and vectors using tf.linalg.eigh, if you only want the values you can use eigvalsh
eigen_values_A, eigen_vectors_A = tf.linalg.eigh(e_matrix_A)
print("Eigen Vectors: \n{} \n\nEigen Values: \n{}\n".format(eigen_vectors_A, eigen_values_A))
# Now lets plot our Matrix with the Eigen vector and see how it looks
Av = tf.tensordot(e_matrix_A, eigen_vectors_A, axes=0)
vector_plot([tf.reshape(Av, [-1]), tf.reshape(eigen_vectors_A, [-1])], 10, 10)
```

If v is an eigenvector of A, then so is any rescaled vector sv for s∈R,s≠0.

```
# Lets us multiply our eigen vector by a random value s and plot the above graph again to see the rescaling
sv = tf.multiply(5, eigen_vectors_A)
vector_plot([tf.reshape(Av, [-1]), tf.reshape(sv, [-1])], 10, 10)
```

Suppose that a matrix A has n linearly independent eigenvectors v(1),⋯,v(n) with corresponding eigenvalues λ(1),⋯,λ(n). We may concatenate all the eigenvectors to form a matrix V with one eigenvector per column: V=[v(1),⋯,v(n)]. Likewise, we can concatenate the eigenvalues to form a vector λ=[λ(1),⋯,λ(n)]⊤. The **eigendecomposition** of A is then given by

```
# Creating a matrix A to find it's decomposition
eig_matrix_A = tf.constant([[5, 1], [3, 3]], dtype=tf.float32)
new_eigen_values_A, new_eigen_vectors_A = tf.linalg.eigh(eig_matrix_A)
print("Eigen Values of Matrix A: {} \n\nEigen Vector of Matrix A: \n{}\n".format(new_eigen_values_A, new_eigen_vectors_A))
# calculate the diag(lamda)
diag_lambda = tf.linalg.diag(new_eigen_values_A)
print("Diagonal of Lambda: \n{}\n".format(diag_lambda))
# Find the eigendecomposition of matrix A
decomp_A = tf.tensordot(tf.tensordot(eigen_vectors_A, diag_lambda, axes=1), tf.linalg.inv(new_eigen_vectors_A), axes=1)
print("The decomposition Matrix A: \n{}".format(decomp_A))
```

Not every matrix can be decomposed into eigenvalues and eigenvectors. In some cases, the decomposition exists but involves complex rather than real numbers.

In this book, we usually need to decompose only a specific class of matrices that have a simple decomposition. Specifically, every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues:

where Q is an orthogonal matrix composed of eigenvectors of A and Λ is a diagonal matrix. The eigenvalue Λi,i is associated with the eigenvector in column *i* of Q, denoted as Q:,i. Because Q is an orthogonal matrix, we can think of A as scaling space by Λi in direction v(i).

```
# In section 2.6 we manually created a matrix to verify if it is symmetric, but what if we don't know the exact values and want to create a random symmetric matrix
new_matrix_A = tf.Variable(tf.random.uniform([2,2], minval=1, maxval=10, dtype=tf.float32))
# to create an upper triangular matrix from a square one
X_upper = tf.linalg.band_part(new_matrix_A, 0, -1)
sym_matrix_A = tf.multiply(0.5, (X_upper + tf.transpose(X_upper)))
print("Symmetric Matrix A: \n{}\n".format(sym_matrix_A))
# create orthogonal matrix Q from eigen vectors of A
eigen_values_Q, eigen_vectors_Q = tf.linalg.eigh(sym_matrix_A)
print("Matrix Q: \n{}\n".format(eigen_vectors_Q))
# putting eigen values in a diagonal matrix
new_diag_lambda = tf.linalg.diag(eigen_values_Q)
print("Matrix Lambda: \n{}\n".format(new_diag_lambda))
sym_RHS = tf.tensordot(tf.tensordot(eigen_vectors_Q, new_diag_lambda, axes=1), tf.transpose(eigen_vectors_Q), axes=1)
predictor = tf.reduce_all(tf.equal(tf.round(sym_RHS), tf.round(sym_matrix_A)))
def true_print(): print("It WORKS. \nRHS: \n{} \n\nLHS: \n{}".format(sym_RHS, sym_matrix_A))
def false_print(): print("Condition FAILED. \nRHS: \n{} \n\nLHS: \n{}".format(sym_RHS, sym_matrix_A))
tf.cond(predictor, true_print, false_print)
```

The eigendecomposition of a matrix tells us many useful facts about the matrix. The matrix is singular if and only if any of the eigenvalues are zero. The eigendecomposition of a real symmetric matrix can also be used to optimize quadratic expressions of the formf(x)=x⊤Ax subject to ∥x∥2=1.

The above equation can be solved as following, we know that if x

is an Eigenvector of A and λ is the corresponding eigenvalue, then Ax=λx, therefore f(x)=x⊤Ax=x⊤λx=x⊤xλ and since ∥x∥2=1 and x⊤x=1, the above equation boils down to f(x)=λWhenever x is equal to an eigenvector of A, f takes on the value of the corresponding eigenvalue and its minimum value within the constraint region is the minimum eigenvalue.

A matrix whose eigenvalues are all positive is called **positive definite**. A matrix whose eigenvalues are all positive or zero valued is called **positive semidefinite**. Likewise, if all eigenvalues are negative, the matrix is **negative definite**, and if all eigenvalues are negative or zero valued, it is **negative semidefinite**. Positive semidefinite matrces are interesting because they guarantee that ∀x,x⊤Ax≥0. Positive definite matrices additionally guarantee that xTAx=0⟹x=0.

The **singular value decomposition (SVD)** provides another way to factorize a matrix into **singular vectors** and **singular values**. The SVD enables us to discover some of the same kind of information as the eigendecomposition reveals, however, the SVD is more generally applicable. Every real matrix has a singular value decomposition, but the same is not true of the eigenvalue decomposition. SVD can be written as:

Suppose A is an *m x n* matrix, then U is defined to be an *m x m* rotation matrix, D to be an *m x n* matrix scaling & projecting matrix, and V to be an *n x n* rotation matrix. Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices (U⊤=U−1 and V⊤=V−1). The matrix D is defined to be a diagonal matrix.

The elements along the diagonal of D are known as the **singular values** of the matrix A. The columns of U are known as the **left-singular vectors**. The columns of V are known as as the **right-singular vectors**.

```
# mxn matrix A
svd_matrix_A = tf.constant([[2, 3], [4, 5], [6, 7]], dtype=tf.float32)
print("Matrix A: \n{}\n".format(svd_matrix_A))
# Using tf.linalg.svd to calculate the singular value decomposition where d: Matrix D, u: Matrix U and v: Matrix V
d, u, v = tf.linalg.svd(svd_matrix_A, full_matrices=True, compute_uv=True)
print("Diagonal D: \n{} \n\nMatrix U: \n{} \n\nMatrix V^T: \n{}".format(d, u, v))
```

```
# Lets see if we can bring back the original matrix from the values we have
# mxm orthogonal matrix U
svd_matrix_U = tf.constant([[0.30449855, -0.86058956, 0.40824753], [0.54340035, -0.19506174, -0.81649673], [0.78230214, 0.47046405, 0.40824872]])
print("Orthogonal Matrix U: \n{}\n".format(svd_matrix_U))
# mxn diagonal matrix D
svd_matrix_D = tf.constant([[11.782492, 0], [0, 0.41578525], [0, 0]], dtype=tf.float32)
print("Diagonal Matrix D: \n{}\n".format(svd_matrix_D))
# nxn transpose of matrix V
svd_matrix_V_trans = tf.constant([[0.63453555, 0.7728936], [0.7728936, -0.63453555]], dtype=tf.float32)
print("Transpose Matrix V: \n{}\n".format(svd_matrix_V_trans))
# UDV(^T)
svd_RHS = tf.tensordot(tf.tensordot(svd_matrix_U, svd_matrix_D, axes=1), svd_matrix_V_trans, axes=1)
predictor = tf.reduce_all(tf.equal(tf.round(svd_RHS), svd_matrix_A))
def true_print(): print("It WORKS. \nRHS: \n{} \n\nLHS: \n{}".format(tf.round(svd_RHS), svd_matrix_A))
def false_print(): print("Condition FAILED. \nRHS: \n{} \n\nLHS: \n{}".format(tf.round(svd_RHS), svd_matrix_A))
tf.cond(predictor, true_print, false_print)
```

Matrix A can be seen as a linear transformation. This transformation can be decomposed in to three sub-transformations:

- Rotation,
- Re-scaling and projecting,
- Rotation.

These three steps correspond to the three matrices U,D and V

Let’s see how these transformations are taking place in order

```
# Let's define a unit square
svd_square = tf.constant([[0, 0, 1, 1],[0, 1, 1, 0]], dtype=tf.float32)
# a new 2x2 matrix
svd_new_matrix = tf.constant([[1, 1.5], [0, 1]])
# SVD for the new matrix
new_d, new_u, new_v = tf.linalg.svd(svd_new_matrix, full_matrices=True, compute_uv=True)
# lets' change d into a diagonal matrix
new_d_marix = tf.linalg.diag(new_d)
# Rotation: V^T for a unit square
plot_transform(svd_square, tf.tensordot(new_v, svd_square, axes=1), "$Square$", "$V^T \cdot Square$", "Rotation", axis=[-0.5, 3.5 , -1.5, 1.5])
plt.show()
# Scaling and Projecting: DV^(T)
plot_transform(tf.tensordot(new_v, svd_square, axes=1), tf.tensordot(new_d_marix, tf.tensordot(new_v, svd_square, axes=1), axes=1), "$V^T \cdot Square$", "$D \cdot V^T \cdot Square$", "Scaling and Projecting", axis=[-0.5, 3.5 , -1.5, 1.5])
plt.show()
# Second Rotation: UDV^(T)
trans_1 = tf.tensordot(tf.tensordot(new_d_marix, new_v, axes=1), svd_square, axes=1)
trans_2 = tf.tensordot(tf.tensordot(tf.tensordot(new_u, new_d_marix, axes=1), new_v, axes=1), svd_square, axes=1)
plot_transform(trans_1, trans_2,"$U \cdot D \cdot V^T \cdot Square$", "$D \cdot V^T \cdot Square$", "Second Rotation", color=['#1190FF', '#FF9A13'], axis=[-0.5, 3.5 , -1.5, 1.5])
plt.show()
```

The above sub transformations can be found for each matrix as follows:

- U corresponds to the eigenvectors of AA⊤
- V corresponds to to the eigenvectors of A⊤A
- D corresponds to the eigenvalues AA⊤ or A⊤ A which are the same.

As an exercise try proving this is the case.

Perhaps the most useful feature of the SVD is that we can use it to partially generalize matrix inversion to nonsquare matrices, as we will see in the next section.

The Moore-Penrose PseudoinverseMatrix inversion is not defined for matrices that are not square. Suppose we want to make a left-inverse B of a matrix A so that we can solve a linear equation Ax=Y by left multiplying each side to obtain x=By .

Depending on the structure of the problem, it may not be possible to design a unique mapping from A to B.

The **Moore-Penrose pseudoinverse** enables use to make some headway in these cases. The pseudoinverse of A is defined as a matrix:

Practical algorithms for computing the pseudoinverse are based not on this definition, but rather on the formula:

where U,D and V are the singular decomposition of A and the pseudoinverse of D+ of a diagonal matrix D is obtained by taking the reciprocal of its nonzero elements then taking the transpose of the resulting matrix.

```
# Matrix A
mpp_matrix_A = tf.random.uniform([3, 2], minval=1, maxval=10, dtype=tf.float32)
print("Matrix A: \n{}\n".format(mpp_matrix_A))
# Singular Value decomposition of matrix A
mpp_d, mpp_u, mpp_v = tf.linalg.svd(mpp_matrix_A, full_matrices=True, compute_uv=True)
print("Matrix U: \n{} \n\nMatrix V: \n{}\n".format(mpp_u, mpp_v))
# pseudo inverse of matrix D
d_plus = tf.concat([tf.transpose(tf.linalg.diag(tf.math.reciprocal(mpp_d))), tf.zeros([2, 1])], axis=1)
print("D plus: \n{}\n".format(d_plus))
# moore-penrose pseudoinverse of matrix A
matrix_A_star = tf.matmul(tf.matmul(mpp_v, d_plus, transpose_a=True), mpp_u, transpose_b=True)
print("The Moore-Penrose pseudoinverse of Matrix A: \n{}".format(matrix_A_star))
```

When A has more columns than rows, then solving a linear equation using the pseudoinverse provides one of the many possible solutions. Specifically, it provides the solution x=A+y with minimal Euclidean norm ∥x∥2 among all possible solutions.

```
mpp_vector_y = tf.constant([[2], [3], [4]], dtype=tf.float32)
print("Vector y: \n{}\n".format(mpp_vector_y))
mpp_vector_x = tf.matmul(matrix_A_star, mpp_vector_y)
print("Vector x: \n{}".format(mpp_vector_x))
```

When A has more rows than columns, it is possible for there to be no solution. In this case, using the pseudoinverse gives us the x for which Ax is as close as possible to y in terms of Euclidean norm ∥Ax−y∥2

The trace operator gives the sum of all the diagonal entries of a matrix:

```
# random 3x3 matrix A
to_matrix_A = tf.random.uniform([3, 3], minval=0, maxval=10, dtype=tf.float32)
# trace of matrix A using tf.linalg.trace
trace_matrix_A = tf.linalg.trace(to_matrix_A)
print("Trace of Matrix A: \n{} \nis: {}".format(to_matrix_A, trace_matrix_A))
```

The trace operator is useful for a variety of reasons. Some operations that are difficult to specify without resorting to summation notation can be specified using matrix products and the trace operator. For example, the trace operator provides an alternative way of writing the Frobenius norm of a matrix:

```
# Frobenius Norm of A
frobenius_A = tf.norm(to_matrix_A)
# sqrt(Tr(A times A^T))
trace_rhs = tf.sqrt(tf.linalg.trace(tf.matmul(to_matrix_A, to_matrix_A, transpose_b=True)))
predictor = tf.equal(tf.round(frobenius_A), tf.round(trace_rhs))
def true_print(): print("It WORKS. \nRHS: {} \nLHS: {}".format(frobenius_A, trace_rhs))
def false_print(): print("Condition FAILED. \nRHS: {} \nLHS: {}".format(frobenius_A, trace_rhs))
tf.cond(predictor, true_print, false_print)
```

Writing an expression in terms of the trace operator opens up opportunities to manipulate the expression using many useful identities. For example, the trace operator is invariant to the transpose operator:

```
# Transpose of Matrix A
trans_matrix_A = tf.transpose(to_matrix_A)
#Trace of the transpose Matrix A
trace_trans_A = tf.linalg.trace(trans_matrix_A)
predictor = tf.equal(trace_matrix_A, trace_trans_A)
def true_print(): print("It WORKS. \nRHS: {} \nLHS: {}".format(trace_matrix_A, trace_trans_A))
def false_print(): print("Condition FAILED. \nRHS: {} \nLHS: {}".format(trace_matrix_A, trace_trans_A))
tf.cond(predictor, true_print, false_print)
```

The trace of a square matrix composed of many factors is also invariant to moving the last factor into the first position, if the shapes of the corresponding matrices allow the resulting product to be defined:

```
# random 3x3 matrix B and matrix C
to_matrix_B = tf.random.uniform([3, 3], minval=0, maxval=10, dtype=tf.float32)
to_matrix_C = tf.random.uniform([3, 3], minval=0, maxval=10, dtype=tf.float32)
# ABC
abc = tf.tensordot((tf.tensordot(to_matrix_A, to_matrix_B, axes=1)), to_matrix_C, axes=1)
# CAB
cab = tf.tensordot((tf.tensordot(to_matrix_C, to_matrix_A, axes=1)), to_matrix_B, axes=1)
# BCA
bca = tf.tensordot((tf.tensordot(to_matrix_B, to_matrix_C, axes=1)), to_matrix_A, axes=1)
# trace of matrix ABC, CAB and matrix BCA
trace_matrix_abc = tf.linalg.trace(abc)
trace_matrix_cab = tf.linalg.trace(cab)
trace_matrix_bca = tf.linalg.trace(bca)
predictor = tf.equal(tf.round(trace_matrix_abc), tf.round(trace_matrix_cab)) and tf.equal(tf.round(trace_matrix_cab), tf.round(trace_matrix_bca))
def true_print(): print("It WORKS. \nABC: {} \nCAB: {} \nBCA: {}".format(trace_matrix_abc, trace_matrix_cab, trace_matrix_bca))
def false_print(): print("Condition FAILED. \nABC: {} \nCAB: {} \nBCA: {}".format(trace_matrix_abc, trace_matrix_cab, trace_matrix_bca))
tf.cond(predictor, true_print, false_print)
```

This invariance to cyclic permutation holds even if the resulting product has a different shape. For example, for A∈Rm×n and B∈Rn×m, we have Tr(AB)=Tr(BA) even though AB∈Rm×m and BA∈Rn×n

```
# mxn matrix A
to_new_matrix_A = tf.random.uniform([3, 2], minval=0, maxval=10, dtype=tf.float32)
print(" 3x2 Matrix A: \n{}\n".format(to_new_matrix_A))
# mxn matrix B
to_new_matrix_B = tf.random.uniform([2, 3], minval=0, maxval=10, dtype=tf.float32)
print(" 3x2 Matrix B: \n{}\n".format(to_new_matrix_B))
# trace of matrix AB and BA
ab = tf.linalg.trace(tf.matmul(to_new_matrix_A, to_new_matrix_B))
ba = tf.linalg.trace(tf.matmul(to_new_matrix_B, to_new_matrix_A))
predictor = tf.equal(tf.round(ab), tf.round(ba))
def true_print(): print("It WORKS. \nAB: {} \nBA: {}".format(ab, ba))
def false_print(): print("Condition FAILED. \nAB: {} \nBA: {}".format(ab, ba))
tf.cond(predictor, true_print, false_print)
```

The determinant of a square matrix, denoted det(A), is a function that maps matrices to real scalars. You can calculate the determinant of a 2 x 2 matrix as:

For a 3 x 3 matrix:

```
# calculate det of a matrix
det_matrix_A = tf.constant([[3,1], [0,3]], dtype=tf.float32)
det_A = tf.linalg.det(det_matrix_A)
print("Matrix A: \n{} \nDeterminant of Matrix A: \n{}".format(det_matrix_A, det_A))
vector_plot(det_matrix_A, 5, 5)
```

The determinant is equal to the product of all the eigenvalues of the matrix.

```
# Let's find the eigen values of matrix A
d_eigen_values = tf.linalg.eigvalsh(det_matrix_A)
eigvalsh_product = tf.multiply(d_eigen_values[0], d_eigen_values[1])
# lets validate if the product of the eigen values is equal to the determinant
predictor = tf.equal(eigvalsh_product, det_A)
def true_print(): print("It WORKS. \nRHS: \n{} \n\nLHS: \n{}".format(eigvalsh_product, det_A))
def false_print(): print("Condition FAILED. \nRHS: \n{} \n\nLHS: \n{}".format(eigvalsh_product, det_A))
tf.cond(predictor, true_print, false_print)
```

The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space. If the determinant is 0, then space is contracted completely along at least one dimension, causing it to lose all its volume. If the determinant is 1, then the transformation preserves volume.

```
# If you see the following plot, you can see the vectors are expanded
vector_plot(tf.multiply(tf.abs(det_A), det_matrix_A), 50, 50)
```

Example: Principal Components Analysis
PCA is a complexity reduction technique that tries to reduce a set of variables down to a smaller set of components that represent most of the information in the variables. This can be though of as for a collection of data points applying lossy compression, meaning storing the points in a way that require less memory by trading some precision. At a conceptual level, PCA works by identifying sets of variables that share variance, and creating a component to represent that variance.

Earlier, when we were doing transpose or the matrix inverse, we relied on using Tensorflow’s built in functions but for PCA, there is no such function, except one in the Tensorflow Extended (tft).

There are multiple ways you can implement a PCA in Tensorflow but since this algorithm is such an important one in the machine learning world, we will take the long route.

The reason of having PCA under Linear Algebra is to show that PCA could be implemented using the theorems we studied in this Chapter.

```
# To start working with PCA, let's start by creating a 2D data set
x_data = tf.multiply(5, tf.random.uniform([100], minval=0, maxval=100, dtype = tf.float32, seed = 0))
y_data = tf.multiply(2, x_data) + 1 + tf.random.uniform([100], minval=0, maxval=100, dtype = tf.float32, seed = 0)
X = tf.stack([x_data, y_data], axis=1)
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.plot(X[:,0], X[:,1], '+', color='b')
plt.grid()
```

We start by standardizing the data. Even though the data we created are on the same scales, its always a good practice to start by standardizing the data because most of the time the data you will be working with will be in different scales.

```
def normalize(data):
# creates a copy of data
X = tf.identity(data)
# calculates the mean
X -=tf.reduce_mean(data, axis=0)
return X
normalized_data = normalize(X)
plt.plot(normalized_data[:,0], normalized_data[:,1], '+', color='b')
plt.grid()
```

Recall that PCA can be thought of as applying lossy compression to a collection of x data points. The way we can minimize the loss of precision is by finding some decoding function f(x)≈c where c will be the corresponding vector.

PCA is defined by our choice of this decoding function. Specifically, to make the decoder very simple, we chose to use matrix multiplication to map c and define g(c)=Dc. Our goal is to minimize the distance between the input point x to its reconstruction and to do that we use L2 norm. Which boils down to our encoding function c=D⊤x.

Finally, to reconstruct the PCA we use the same matrix D to decode all the points and to solve this optimization problem, we use eigendecomposition. Please note that the following equation is the final version of a lot of matrix tranformations. I don’t provide the derivatives because the goal is to focus on the mathematical implementation, rather than the derivation.

To find d we can calculate the eigenvectors XTX.

```
# Finding the Eigne Values and Vectors for the data
eigen_values, eigen_vectors = tf.linalg.eigh(tf.tensordot(tf.transpose(normalized_data), normalized_data, axes=1))
print("Eigen Vectors: \n{} \nEigen Values: \n{}".format(eigen_vectors, eigen_values))
```

The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude.

Now, lets use these Eingen vectors to rotate our data. The goal of the rotation is to end up with a new coordinate system where data is uncorrelated and thus where the basis axes gather all the variance. Thereby reducing the dimension.

Recall our encoding function c=D⊤x , where D is the matrix containing the eigenvectors that we have calculated before.

```
X_new = tf.tensordot(tf.transpose(eigen_vectors), tf.transpose(normalized_data), axes=1)
plt.plot(X_new[0, :], X_new[1, :], '+', color='b')
plt.xlim(-500, 500)
plt.ylim(-700, 700)
plt.grid()
```

That is the transformed data and that’s it folks for our chapter on Linear Algebra 😉.

CongratulationsYou have successfully completed Chapter 2 Linear algebra of Deep Learning with Tensorflow 2.0. To recap, we went through the following concepts:

- Scalars, Vectors, Matrices and Tensors
- Multiplying Matrices and Vectors
- Identity and Inverse Matrices
- Linear Dependence and Span
- Norms
- Special Kinds of Matrices and Vectors
- Eigendecomposition
- Singular Value Decomposition
- The Moore-Penrose Pseudoinverse
- The Trace Operator
- The Determinant
- Example: Principal Components Analysis

We covered a lot of content in one notebook, like I mentioned in the begining, this is not meant to be an absolute beginner or a comprehensive chapter on Linear Algebra, our focus is Deep Learning with Tensorflow, so we only went through the material we need to understand Deep Learning.

🔥 You can access the Code for this notebook in GitHub or launch an executable versions of this notebook using Google Colab. 🔥

**Thanks for reading** ❤

If you liked this post, share it with all of your programming buddies!

Follow us on **Facebook** | **Twitter**

☞ The Data Science Course 2019: Complete Data Science Bootcamp

☞ Machine Learning A-Z™: Hands-On Python & R In Data Science

☞ Tableau 10 A-Z: Hands-On Tableau Training For Data Science!

☞ R Programming A-Z™: R For Data Science With Real Exercises!

☞ Machine Learning, Data Science and Deep Learning with Python

A step-by-step guide to setting up Python for Deep Learning and Data Science for a complete beginner

A step-by-step guide to setting up Python for Deep Learning and Data Science for a complete beginner

You can code your own Data Science or Deep Learning project in just a couple of lines of code these days. This is not an exaggeration; many programmers out there have done the hard work of writing tons of code for us to use, so that all we need to do is plug-and-play rather than write code from scratch.

You may have seen some of this code on Data Science / Deep Learning blog posts. Perhaps you might have thought: “Well, if it’s really that easy, then why don’t I try it out myself?”

If you’re a beginner to Python and you want to embark on this journey, then this post will guide you through your first steps. A common complaint I hear from complete beginners is that it’s pretty difficult to set up Python. How do we get everything started in the first place so that we can plug-and-play Data Science or Deep Learning code?

This post will guide you through in a step-by-step manner how to set up Python for your Data Science and Deep Learning projects. We will:

- Set up Anaconda and Jupyter Notebook
- Create Anaconda environments and install packages (code that others have written to make our lives tremendously easy) like tensorflow, keras, pandas, scikit-learn and matplotlib.

Once you’ve set up the above, you can build your first neural network to predict house prices in this tutorial here:

Build your first Neural Network to predict house prices with Keras

The main programming language we are going to use is called Python, which is the most common programming language used by Deep Learning practitioners.

The first step is to download Anaconda, which you can think of as a platform for you to use Python “out of the box”.

Visit this page: https://www.anaconda.com/distribution/ and scroll down to see this:

This tutorial is written specifically for Windows users, but the instructions for users of other Operating Systems are not all that different. Be sure to click on “Windows” as your Operating System (or whatever OS that you are on) to make sure that you are downloading the correct version.

This tutorial will be using Python 3, so click the green Download button under “Python 3.7 version”. A pop up should appear for you to click “Save” into whatever directory you wish.

Once it has finished downloading, just go through the setup step by step as follows:

Click Next

Click “I Agree”

Click Next

Choose a destination folder and click Next

Click Install with the default options, and wait for a few moments as Anaconda installs

Click Skip as we will not be using Microsoft VSCode in our tutorials

Click Finish, and the installation is done!

Once the installation is done, go to your Start Menu and you should see some newly installed software:

You should see this on your start menu

Click on Anaconda Navigator, which is a one-stop hub to navigate the apps we need. You should see a front page like this:

Anaconda Navigator Home Screen

Click on ‘Launch’ under Jupyter Notebook, which is the second panel on my screen above. Jupyter Notebook allows us to run Python code interactively on the web browser, and it’s where we will be writing most of our code.

A browser window should open up with your directory listing. I’m going to create a folder on my Desktop called “Intuitive Deep Learning Tutorial”. If you navigate to the folder, your browser should look something like this:

Navigating to a folder called Intuitive Deep Learning Tutorial on my Desktop

On the top right, click on New and select “Python 3”:

Click on New and select Python 3

A new browser window should pop up like this.

Browser window pop-up

Congratulations — you’ve created your first Jupyter notebook! Now it’s time to write some code. Jupyter notebooks allow us to write snippets of code and then run those snippets without running the full program. This helps us perhaps look at any intermediate output from our program.

To begin, let’s write code that will display some words when we run it. This function is called *print*. Copy and paste the code below into the grey box on your Jupyter notebook:

```
print("Hello World!")
```

Your notebook should look like this:

Entering in code into our Jupyter Notebook

Now, press Alt-Enter on your keyboard to run that snippet of code:

Press Alt-Enter to run that snippet of code

You can see that Jupyter notebook has displayed the words “Hello World!” on the display panel below the code snippet! The number 1 has also filled in the square brackets, meaning that this is the first code snippet that we’ve run thus far. This will help us to track the order in which we have run our code snippets.

Instead of Alt-Enter, note that you can also click Run when the code snippet is highlighted:

Click Run on the panel

If you wish to create new grey blocks to write more snippets of code, you can do so under Insert.

Jupyter Notebook also allows you to write normal text instead of code. Click on the drop-down menu that currently says “Code” and select “Markdown”:

Now, our grey box that is tagged as markdown will not have square brackets beside it. If you write some text in this grey box now and press Alt-Enter, the text will render it as plain text like this:

If we write text in our grey box tagged as markdown, pressing Alt-Enter will render it as plain text.

There are some other features that you can explore. But now we’ve got Jupyter notebook set up for us to start writing some code!

Now we’ve got our coding platform set up. But are we going to write Deep Learning code from scratch? That seems like an extremely difficult thing to do!

The good news is that many others have written code and made it available to us! With the contribution of others’ code, we can play around with Deep Learning models at a very high level without having to worry about implementing all of it from scratch. This makes it extremely easy for us to get started with coding Deep Learning models.

For this tutorial, we will be downloading five packages that Deep Learning practitioners commonly use:

- Set up Anaconda and Jupyter Notebook
- Create Anaconda environments and install packages (code that others have written to make our lives tremendously easy) like tensorflow, keras, pandas, scikit-learn and matplotlib.

The first thing we will do is to create a Python environment. An environment is like an isolated working copy of Python, so that whatever you do in your environment (such as installing new packages) will not affect other environments. It’s good practice to create an environment for your projects.

Click on Environments on the left panel and you should see a screen like this:

Anaconda environments

Click on the button “Create” at the bottom of the list. A pop-up like this should appear:

A pop-up like this should appear.

Name your environment and select Python 3.7 and then click Create. This might take a few moments.

Once that is done, your screen should look something like this:

Notice that we have created an environment ‘intuitive-deep-learning’. We can see what packages we have installed in this environment and their respective versions.

Now let’s install some packages we need into our environment!

The first two packages we will install are called Tensorflow and Keras, which help us plug-and-play code for Deep Learning.

On Anaconda Navigator, click on the drop down menu where it currently says “Installed” and select “Not Installed”:

A whole list of packages that you have not installed will appear like this:

Search for “tensorflow”, and click the checkbox for both “keras” and “tensorflow”. Then, click “Apply” on the bottom right of your screen:

A pop up should appear like this:

Click Apply and wait for a few moments. Once that’s done, we will have Keras and Tensorflow installed in our environment!

Using the same method, let’s install the packages ‘pandas’, ‘scikit-learn’ and ‘matplotlib’. These are common packages that data scientists use to process the data as well as to visualize nice graphs in Jupyter notebook.

This is what you should see on your Anaconda Navigator for each of the packages.

**Pandas:**

Installing pandas into your environment

**Scikit-learn:**

Installing scikit-learn into your environment

**Matplotlib:**

Installing matplotlib into your environment

Once it’s done, go back to “Home” on the left panel of Anaconda Navigator. You should see a screen like this, where it says “Applications on intuitive-deep-learning” at the top:

Now, we have to install Jupyter notebook in this environment. So click the green button “Install” under the Jupyter notebook logo. It will take a few moments (again). Once it’s done installing, the Jupyter notebook panel should look like this:

Click on Launch, and the Jupyter notebook app should open.

Create a notebook and type in these five snippets of code and click Alt-Enter. This code tells the notebook that we will be using the five packages that you installed with Anaconda Navigator earlier in the tutorial.

```
import tensorflow as tf
import keras
import pandas
import sklearn
import matplotlib
```

If there are no errors, then congratulations — you’ve got everything installed correctly:

A sign that everything works!

If you have had any trouble with any of the steps above, please feel free to comment below and I’ll help you out!

*Originally published by ** Joseph Lee Wei En** at *

===================================================================

Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on **Facebook** | **Twitter**

☞ A Complete Machine Learning Project Walk-Through in Python

☞ Machine Learning In Node.js With TensorFlow.js

☞ An A-Z of useful Python tricks

☞ Top 10 Algorithms for Machine Learning Newbies

☞ Automated Machine Learning on the Cloud in Python

☞ Introduction to PyTorch and Machine Learning

☞ Python Tutorial for Beginners (2019) - Learn Python for Machine Learning and Web Development

☞ Machine Learning A-Z™: Hands-On Python & R In Data Science

☞ Python for Data Science and Machine Learning Bootcamp

☞ Data Science, Deep Learning, & Machine Learning with Python

In this article, we will learn how deep learning works and get familiar with its terminology — such as backpropagation and batch size

Originally published by Milad Toutounchian at https://towardsdatascience.com

Deep learning is one of the most popular models currently being used in real-world, Data Science applications. It’s been an effective model in areas that range from image to text to voice/music. With the increase in its use, the ability to quickly and scalably implement deep learning becomes paramount. The rise of deep learning platforms such as Tensorflow, help developers implement what they need to in easier ways.

In this article, we will learn how deep learning works and get familiar with its terminology — such as backpropagation and batch size. We will implement a simple deep learning model — from theory to scratch implementation — for a predefined input and output in Python, and then do the same using deep learning platforms such as Keras and Tensorflow. We have written this simple deep learning model using Keras and Tensorflow version 1.x and version 2.0 with three different levels of complexity and ease of coding.

Deep Learning Implementation from ScratchConsider a simple multi-layer-perceptron with four input neurons, one hidden layer with three neurons and an output layer with one neuron. We have three data-samples for the input denoted as X, and three data-samples for the desired output denoted as yt. So, each input data-sample has four features.

```
# Inputs and outputs of the neural net:
import numpy as np
X=np.array([[1.0, 0.0, 1.0, 0.0],[1.0, 0.0, 1.0, 1.0],[0.0, 1.0, 0.0, 1.0]])
yt=np.array([[1.0],[1.0],[0.0]])
```

The * x**(m)

The goal of a neural net (NN) is to obtain weights and biases such that for a given input, the NN provides the desired output. But, we do not know the appropriate weights and biases in advance, so we update the weights and biases such that the error between the output of NN, *yp(m)*, and desired ones, *yt(m)*, is minimized. This iterative minimization process is called the NN training.

Assume the activation functions for both hidden and output layers are sigmoid functions. Therefore,

The size of weights, biases and the relationships between input and outputs of the neural net

Where activation function is the sigmoid, *m* is the *m*th data-sample and *yp(m)* is the NN output.

The error function, which measures the difference between the output of NN with the desired one, can be expressed mathematically as:

The Error defined for the neural net which is squared error

The pseudocode for the above NN has been summarized below:

pseudocode for the neural net training

From our pseudocode, we realize that the partial derivative of Error (E) with respect to parameters (weights and biases) should be computed. Using the chain rule from calculus we can write:

We have two options here for updating the weights and biases in backward path (backward path means updating weights and biases such that error is minimized):

- Use all *N * samples of the training data
- Use one sample (or a couple of samples)

For the first one, we say the batch size is *N*. For the second one, we say batch size is 1, if use one sample to updates the parameters. So batch size means how many data samples are being used for updating the weights and biases.

You can find the implementation of the above neural net, in which the gradient of the error with respect to parameters is calculated Symbolically, with different batch sizes here.

As you can see with the above example, creating a simple deep learning model from scratch involves methods that are very complex. In the next section, we will see how deep learning frameworks can assist in introducing scalability and greater ease of implementation to our model.

Deep Learning implementation using Keras, Tensorflow 1.x and 2.0In the previous section, we computed the gradient of Error w.r.t. parameters from using the chain rule. We saw first-hand that it is not an easy or scalable approach. Also, keep in mind that we evaluate the partial derivatives at each iteration, and as a result, the Symbolic Gradient is not needed although its value is important. This is where deep-learning frameworks such as Keras and Tensorflow can play their role. The deep-learning frameworks use an AutoDiff method for numerical calculations of partial gradients. If you’re not familiar with AutoDiff, StackExchange has a great example to walk through.

The AutoDiff decomposes the complex expression into a set of primitive ones, i.e. expressions consisting of at most a single function call. As the differentiation rules for each separate expression are already known, the final results can be computed in an efficient way.

We have implemented the NN model with three different levels in Keras, Tensorflow 1.x and Tensorflow 2.0:

**1- High-Level (Keras and Tensorflow 2.0): **High-Level Tensorflow 2.0 with Batch Size 1

**2- Medium-Level (Tensorflow 1.x and 2.0): **Medium-Level Tensorflow 1.x with Batch Size 1 , Medium-Level Tensorflow 1.x with Batch Size N, Medium-Level Tensorflow 2.0 with Batch Size 1, Medium-Level Tensorflow v 2.0 with Batch Size N

**3- Low-Level (Tensorflow 1.x): **Low-Level Tensorflow 1.x with Batch Size N

**Code Snippets:**

For the High-Level, we have accomplished the implementation using Keras and Tensorflow v 2.0 with *model.train_on_batch*:

```
# High-Level implementation of the neural net in Tensorflow:
model.compile(loss=mse, optimizer=optimizer)
for _ in range(2000):
for step, (x, y) in enumerate(zip(X_data, y_data)):
model.train_on_batch(np.array([x]), np.array([y]))
```

In the Medium-Level using Tensorflow 1.x, we have defined:

```
E = tf.reduce_sum(tf.pow(ypred - Y, 2))
optimizer = tf.train.GradientDescentOptimizer(0.1)
grads = optimizer.compute_gradients(E, [W_h, b_h, W_o, b_o])
updates = optimizer.apply_gradients(grads)
```

This ensures that in the *for loop*, the updates variable will be updated. For Medium-Level, the gradients and their updates are defined outside the for_loop and inside the for_loop updates is iteratively updated. In the Medium-Level using Tensorflow v 2.x, we have used:

```
# Medium-Level implementation of the neural net in Tensorflow
# In for_loop
with tf.GradientTape() as tape:
x = tf.convert_to_tensor(np.array([x]), dtype=tf.float64)
y = tf.convert_to_tensor(np.array([y]), dtype=tf.float64)
ypred = model(x)
loss = mse(y, ypred)
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
```

In Low-Level implementation, each weight and bias is updated separately. In the Low-Level using Tensorflow v 1.x, we have defined:

```
# Low-Level implementation of the neural net in Tensorflow:
E = tf.reduce_sum(tf.pow(ypred - Y, 2))
dE_dW_h = tf.gradients(E, [W_h])[0]
dE_db_h = tf.gradients(E, [b_h])[0]
dE_dW_o = tf.gradients(E, [W_o])[0]
dE_db_o = tf.gradients(E, [b_o])[0]
# In for_loop:
evaluated_dE_dW_h = sess.run(dE_dW_h,
feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
W_h_i = W_h_i - 0.1 * evaluated_dE_dW_h
evaluated_dE_db_h = sess.run(dE_db_h,
feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
b_h_i = b_h_i - 0.1 * evaluated_dE_db_h
evaluated_dE_dW_o = sess.run(dE_dW_o,
feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
W_o_i = W_o_i - 0.1 * evaluated_dE_dW_o
evaluated_dE_db_o = sess.run(dE_db_o,
feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
b_o_i = b_o_i - 0.1 * evaluated_dE_db_o
```

As you can see with the above low level implementation, the developer has more control over every single step of numerical operations and calculations.

ConclusionWe have now shown that implementing from scratch even a simple deep learning model by using Symbolic gradient computation for weight and bias updates is not an easy or scalable approach. Using deep learning frameworks accelerates this process as a result of using AutoDiff, which is basically a stable numerical gradient computation for updating weights and biases.

**Thanks for reading** ❤

If you liked this post, share it with all of your programming buddies!

Follow us on **Facebook** | **Twitter**

☞ Machine Learning A-Z™: Hands-On Python & R In Data Science

☞ Python for Data Science and Machine Learning Bootcamp

☞ Machine Learning, Data Science and Deep Learning with Python

☞ Deep Learning A-Z™: Hands-On Artificial Neural Networks

☞ Artificial Intelligence A-Z™: Learn How To Build An AI

☞ A Complete Machine Learning Project Walk-Through in Python

☞ Machine Learning: how to go from Zero to Hero

☞ Top 18 Machine Learning Platforms For Developers

☞ 10 Amazing Articles On Python Programming And Machine Learning

☞ 100+ Basic Machine Learning Interview Questions and Answers

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks. Introducing Tensorflow, Using Tensorflow, Introducing Keras, Using Keras, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Learning Deep Learning, Machine Learning with Neural Networks, Deep Learning Tutorial with Python

Machine Learning, Data Science and Deep Learning with PythonExplore the full course on Udemy (special discount included in the link): http://learnstartup.net/p/BkS5nEmZg

In less than 3 hours, you can understand the theory behind modern artificial intelligence, and apply it with several hands-on examples. This is machine learning on steroids! Find out why everyone’s so excited about it and how it really works – and what modern AI can and cannot really do.

In this course, we will cover:

• Deep Learning Pre-requistes (gradient descent, autodiff, softmax)

• The History of Artificial Neural Networks

• Deep Learning in the Tensorflow Playground

• Deep Learning Details

• Introducing Tensorflow

• Using Tensorflow

• Introducing Keras

• Using Keras to Predict Political Parties

• Convolutional Neural Networks (CNNs)

• Using CNNs for Handwriting Recognition

• Recurrent Neural Networks (RNNs)

• Using a RNN for Sentiment Analysis

• The Ethics of Deep Learning

• Learning More about Deep Learning

At the end, you will have a final challenge to create your own deep learning / machine learning system to predict whether real mammogram results are benign or malignant, using your own artificial neural network you have learned to code from scratch with Python.

Separate the reality of modern AI from the hype – by learning about deep learning, well, deeply. You will need some familiarity with Python and linear algebra to follow along, but if you have that experience, you will find that neural networks are not as complicated as they sound. And how they actually work is quite elegant!

This is hands-on tutorial with real code you can download, study, and run yourself.