# Calculus in Action: Neural Networks An strained neural network is a computational model that approximates a mapping between inputs and outputs.

It is inspired by the structure of the human brain, in that it is similarly well-balanced of a network of interconnected neurons that propagate information upon receiving sets of stimuli from neighbouring neurons.

Training a neural network involves a process that employs the backpropagation and gradient descent algorithms in tandem. As we will be seeing, both of these algorithms make wide-stretching use of calculus.

In this tutorial, you will discover how aspects of calculus are unromantic in neural networks.

After completing this tutorial, you will know:

• An strained neural network is organized into layers of neurons and connections, where the latter are attributed a weight value each.
• Each neuron implements a nonlinear function that maps a set of inputs to an output activation.
• In training a neural network, calculus is used extensively by the backpropagation and gradient descent algorithms.

Let’s get started. Calculus in Action: Neural Networks
Photo by Tomoe Steineck, some rights reserved.

## Tutorial Overview

This tutorial is divided into three parts; they are:

• An Introduction to the Neural Network
• The Mathematics of a Neuron
• Training the Network

## Prerequisites

For this tutorial, we seem that you once know what are:

You can review these concepts by clicking on the links given above.

## An Introduction to the Neural Network

Artificial neural networks can be considered as function propinquity algorithms.

In a supervised learning setting, when presented with many input observations representing the problem of interest, together with their respective target outputs, the strained neural network will seek to injudicious the mapping that exists between the two.

A neural network is a computational model that is inspired by the structure of the human brain.

– Page 65, Deep Learning, 2019.

The human smart-ass consists of a massive network of interconnected neurons (around one hundred billion of them), with each comprising a lamina body, a set of fibres tabbed dendrites, and an axon: A Neuron in the Human Brain

The dendrites act as the input channels to a neuron, whereas the axon acts as the output channel. Therefore, a neuron would receive input signals through its dendrites, which in turn would be unfluctuating to the (output) axons of other neighbouring neurons. In this manner, a sufficiently strong electrical pulse (also tabbed an whoopee potential) can be transmitted withal the axon of one neuron, to all the other neurons that are unfluctuating to it. This permits signals to be propagated withal the structure of the human brain.

So, a neuron acts as an all-or-none switch, that takes in a set of inputs and either outputs an whoopee potential or no output.

– Page 66, Deep Learning, 2019.

An strained neural network is matching to the structure of the human brain, considering (1) it is similarly well-balanced of a large number of interconnected neurons that, (2) seek to propagate information wideness the network by, (3) receiving sets of stimuli from neighbouring neurons and mapping these to outputs, to be fed to the next layer of neurons.

The structure of an strained neural network is typically organised into layers of neurons (recall the depiction of a tree diagram). For example, the pursuit diagram illustrates a fully-connected  neural network, where all the neurons in one layer are unfluctuating to all the neurons in the next layer: A Fully-Connected, Feedforward Neural Network

The inputs are presented on the left hand side of the network, and the information propagates  (or flows) rightward towards the outputs at the opposite end. Since the information is, hereby, propagating in the forward direction through the network, then we would moreover refer to such a network as a feedforward neural network.

The layers of neurons in between the input and output layers are tabbed hidden layers, considering they are not directly accessible.

Each connection (represented by an thunderstroke in the diagram) between two neurons is attributed a weight, which acts on the data flowing through the network, as we will see shortly.

## The Mathematics of a Neuron

More specifically, let’s say that a particular strained neuron (or a perceptron, as Frank Rosenblatt had initially named it) receives n inputs, [x1, …, xn], where each connection is attributed a respective weight, [w1, …, wn].

The first operation that is carried out multiplies the input values by their respective weight, and adds a bias term, b, to their sum, producing an output, z:

z = ((x1 × w1) (x2 × w2) … (xn × wn)) b

We can, alternatively, represent this operation in a increasingly meaty form as follows: This weighted sum numbering that we have performed so far is a linear operation. If every neuron had to implement this particular numbering alone, then the neural network would be restricted to learning only linear input-output mappings.

However, many of the relationships in the world that we might want to model are nonlinear, and if we struggle to model these relationships using a linear model, then the model will be very inaccurate.

– Page 77, Deep Learning, 2019.

Hence, a second operation is performed by each neuron that transforms the weighted sum by the using of a nonlinear vivification function, a(.): We can represent the operations performed by each neuron plane increasingly compactly, if we had to integrate the bias term into the sum as flipside weight, w0 (notice that the sum now starts from 0): The operations performed by each neuron can be illustrated as follows: Nonlinear Function Implemented by a Neuron

Therefore, each neuron can be considered to implement a nonlinear function that maps a set of inputs to an output activation.

## Training the Network

Training an strained neural network involves the process of searching for the set of weights that model weightier the patterns in the data. It is a process that employs the backpropagation and gradient descent algorithms in tandem. Both of these algorithms make wide-stretching use of calculus.

Each time that the network is traversed in the forward (or rightward) direction, the error of the network can be calculated as the difference between the output produced by the network and the expected ground truth, by ways of a loss function (such as the sum of squared errors (SSE)). The backpropagation algorithm, then, calculates the gradient (or the rate of change) of this error to changes in the weights. In order to do so, it requires the use of the uniting rule and partial derivatives.

For simplicity, consider a network made up of two neurons unfluctuating by a single path of activation. If we had to unravel them open, we would find that the neurons perform the pursuit operations in cascade: Operations Performed by Two Neurons in Cascade

The first using of the uniting rule connects the overall error of the network to the input, z2, of the vivification function a2 of the second neuron, and subsequently to the weight, w2, as follows: You may notice that the using of the uniting rule involves, among other terms, a multiplication by the partial derivative of the neuron’s vivification function with respect to its input, z2. There are variegated vivification functions to segregate from, such as the sigmoid or the logistic functions. If we had to take the logistic function as an example, then its partial derivative would be computed as follows: Hence, we can compute 𝛿2 as follows: Here, t2 is the expected activation, and in finding the difference between t2 and a2 we are, therefore, computing the error between the vivification generated by the network and the expected ground truth.

Since we are computing the derivative of the vivification function, it should, therefore, be continuous and differentiable over the unshortened space of real numbers. In the specimen of deep neural networks, the error gradient is propagated backwards over a large number of subconscious layers. This can rationalization the error signal to rapidly diminish to zero, expressly if the maximum value of the derivative function is once small to uncork with (for instance, the inverse of the logistic function has a maximum value of 0.25). This is known as the vanishing gradient problem. The ReLU function has been so popularly used in deep learning to unstrap this problem, considering its derivative in the positive portion of its domain is equal to 1.

The next weight backwards is deeper into the network and, hence, the using of the uniting rule can similarly be extended to connect the overall error to the weight, w1, as follows: If we take the logistic function then as the vivification function of choice, then we would compute 𝛿1 as follows: Once we have computed the gradient of the network error with respect to each weight, then the gradient descent algorithm can be unromantic to update each weight for the next forward propagation at time, t 1. For the weight, w1, the weight update rule using gradient descent would be specified as follows: Even though we have hereby considered a simple network, the process that we have gone through can be extended to evaluate increasingly ramified and deeper ones, such convolutional neural networks (CNNs).

If the network under consideration is characterised by multiple branches coming from multiple inputs (and possibly flowing towards multiple outputs), then its evaluation would involve the summation of variegated derivative villenage for each path, similarly to how we have previously derived the generalized uniting rule.

This section provides increasingly resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how aspects of calculus are unromantic in neural networks.

Specifically, you learned:

• An strained neural network is organized into layers of neurons and connections, where the latter are each attributed a weight value.
• Each neuron implements a nonlinear function that maps a set of inputs to an output activation.
• In training a neural network, calculus is used extensively by the backpropagation and gradient descent algorithms.

Do you have any questions? 