Whether you implement a neural network yourself or you use a built in library for neural network learning, it is of paramount importance to understand the significance of a sigmoid function. The sigmoid function is the key to understanding how a neural network learns ramified problems. This function moreover served as a understructure for discovering other functions that lead to efficient and good solutions for supervised learning in deep learning architectures.

In this tutorial, you will discover the sigmoid function and its role in learning from examples in neural networks.

After completing this tutorial, you will know:

  • The sigmoid function
  • Linear vs. non-linear separability
  • Why a neural network can make ramified visualization boundaries if a sigmoid unit is used

Let’s get started.

A Gentle Introduction to sigmoid function. Photo by Mehreen Saeed, some rights reserved.

A Gentle Introduction to sigmoid function. Photo by Mehreen Saeed, some rights reserved.

Tutorial Overview

This tutorial is divided into 3 parts; they are:

  1. The sigmoid function
    1. The sigmoid function and its properties
  2. Linear vs. non-linearly separable problems
  3. Using a sigmoid as an vivification function in neural networks

Sigmoid Function

The sigmoid function is a special form of the logistic function and is usually denoted by ?(x) or sig(x). It is given by:

?(x) = 1/(1 exp(-x))

Properties and Identities Of Sigmoid Function

The graph of sigmoid function is an S-shaped lines as shown by the untried line in the graph below. The icon moreover shows the graph of the derivative in pink color. The expression for the derivative, withal with some important properties are shown on the right.

Graph of the sigmoid function and its derivative. Some important properties are moreover shown.

Graph of the sigmoid function and its derivative. Some important properties are moreover shown.

A few other properties include:

  1. Domain: (-?, ?)
  2. Range: (0, 1)
  3. ?(0) = 0.5
  4. The function is monotonically increasing.
  5. The function is continuous everywhere.
  6. The function is differentiable everywhere in its domain.
  7. Numerically, it is unbearable to compute this function’s value over a small range of numbers, e.g., [-10, 10]. For values less than -10, the function’s value is scrutinizingly zero. For values greater than 10, the function’s values are scrutinizingly one.

The Sigmoid As A Squashing Function

The sigmoid function is moreover tabbed a squashing function as its domain is the set of all real numbers, and its range is (0, 1). Hence, if the input to the function is either a very large negative number or a very large positive number,  the output is unchangingly between 0 and 1. Same goes for any number between -? and ?.

Sigmoid As An Vivification Function In Neural Networks

The sigmoid function is used as an vivification function in neural networks. Just to review what is an vivification function, the icon unelevated shows the role of an vivification function in one layer of a neural network. A weighted sum of inputs is passed through an vivification function and this output serves as an input to the next layer. 

A sigmoid unit in a neural network

A sigmoid unit in a neural network

When the vivification function for a neuron is a sigmoid function it is a guarantee that the output of this unit will unchangingly be between 0 and 1. Also, as the sigmoid is a non-linear function, the output of this unit would be a non-linear function of the weighted sum of inputs. Such a neuron that employs a sigmoid function as an vivification function is termed as a sigmoid unit.

Linear Vs. Non-Linear Separability?

Suppose we have a typical nomenclature problem, where we have a set of points in space and each point is prescribed a matriculation label. If a straight line (or a hyperplane in an n-dimensional space) can divide the two classes, then we have a linearly separable problem. On the other hand, if a straight line is not unbearable to divide the two classes, then we have a non-linearly separable problem. The icon unelevated shows data in the 2 dimensional space. Each point is prescribed a red or undecorous matriculation label. The left icon shows a linearly separable problem that requires a linear purlieus to distinguish between the two classes. The right icon shows a non-linearly separable problem, where a non-linear visualization purlieus is required.

Linera Vs. Non-Linearly separable problems

Linera Vs. Non-Linearly separable problems

For three dimensional space, a linear visualization purlieus can be described via the equation of a plane. For an n-dimensional space, the linear visualization purlieus is described by the equation of a hyperplane.

Why The Sigmoid Function Is Important In Neural Networks?

If we use a linear vivification function in a neural network, then this model can only learn linearly separable problems. However, with the wing of just one subconscious layer and a sigmoid vivification function in the subconscious layer, the neural network can hands learn a non-linearly separable problem. Using a non-linear function produces non-linear boundaries and hence, the sigmoid function can be used in neural networks for learning ramified visualization functions.

The only non-linear function that can be used as an vivification function in a neural network is one which is monotonically increasing. So for example, sin(x) or cos(x) cannot be used as vivification functions. Also, the vivification function should be specified everywhere and should be continuous everywhere in the space of real numbers. The function is moreover required to be differentiable over the unshortened space of real numbers.

Typically a when propagation algorithm uses gradient descent to learn the weights of a neural network. To derive this algorithm, the derivative of the vivification function is required.

The fact that the sigmoid function is monotonic, continuous and differentiable everywhere, coupled with the property that its derivative can be expressed in terms of itself, makes it easy to derive the update equations for learning the weights in a neural network when using when propagation algorithm.


This section lists some ideas for extending the tutorial that you may wish to explore.

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

Further Reading

This section provides increasingly resources on the topic if you are looking to go deeper.





In this tutorial, you discovered what is a sigmoid function. Specifically, you learned:

  • The sigmoid function and its properties
  • Linear vs. non-linear visualization boundaries
  • Why subtracting a sigmoid function at the subconscious layer enables a neural network to learn ramified non-linear boundaries

Do you have any questions?

Ask your questions in the comments unelevated and I will do my weightier to answer