Activation Functions

We often talk about activation functions when using artificial neural networks. Let’s all consider the definition and varieties of the activation function together.


Activation functions obstruct neural networks to be a linear transformation. Without activation functions, a neural network acts as a linear connecting with limited learning power. When we give complex world information such as image, sound, video to learned by neural network, the network is forced to learn. So, we need nonlinear functions which has multiple degrees. Activation functions can regulate the outputs of nodes and add a level of complexity that neural networks without activation functions cannot achieve. Thus, although the complexity of the network, the network becomes stronger and learns better.

1.Sigmoid Function

The sigmoid function compresses the values it receives from 0 to 1. Here is the mathematical expression for sigmoid function.


When a high value comes, it gets closer to one and produces a stronger signal and when a negative value arrives, it approaches zero and produces a weaker signal.

Figure 1: Graph of sigmoid function

The sigmoid function is not linear so the network becomes more complex and we can use it for more difficult tasks but if we look carefully at the graph, we can see that y values react very little to changes in X. In these regions the derivative values become very small and approach to 0. This is called vanishing gradient, and the learning event takes place at a minimum level. When a slow learning event occurs, the optimization algorithm that minimizes the error can be attached to local minimums and we cannot get the maximum performance that can be obtained from the artificial neural network model.

2.Tanh Function

Figure 2: Graph of Tanh function

The tanh function compresses the values it receives from -1 to 1. Here is the mathematical expression for tanh function.

f(x) =  2/1+e^(-2x) -1

The derivative of tanh function is steeper than the sigmoid function’s derivative, so it can take more value. It means that it will be more efficient because it has a wider range for the classification process. However, the problem of vanishing gradient at the ends of the function continues.

3.ReLU Function

ReLU is commonly used in deep learning neural networks for speech recognition and computer vision. This function first separates the incoming values according to whether they are positive or negative. The output is 0 if the input is negative and return the input unchanged if the input is positive so computer can calculate faster. The problem with the ReLU function is that the derivative of this zero-value region, which gives us processing speed, is also zero, because therefore learning cannot occur there.

f(x)=max (0, x)

Figure 3:  Graph of ReLU function

4. Leaky ReLU

Leaky ReLU function developed against the dead neuron problem in ReLU function.

f(x)=max (0.01x, x)

Figure 4:  Graph of Leaky ReLU function

As shown in the figure, this problem was solved by a 0.01 magnitude leak towards the bottom of the X axis. This value is close to 0, but not 0 because of the vanishing gradients in ReLU survived, so learning is also provided for the values in the negative region.

5. Swish Function

Swish function gets value in negative region like leaky ReLU function, but swish function’s values are not linear.

f(x)= x × 1/1+e^(-x)

Figure 5: Graph of Swish function

Thus, we have seen that activation functions play a key role in artificial neural networks. Hope to see you in our next article…



14 thoughts on “Activation Functions”

  1. south says:

    Ι do accept as true wіth аll the concepts you’ve
    offered to your рost. They are really convincing and will
    certainly work. Still, the posts are too brief for newbies.
    Could you please prolong them a little from subsequent time?
    Thanks for the post.

    1. merveeekilic says:

      Thank you for your comment!I will try to talk about longer blog about this subject.

  2. Awesome! Its actսally amazing paragraph, I have got much clear idea
    rеgarding from this pɑragraph.

    1. merveeekilic says:

      Thank you!

  3. Wһat’s up everybody, hеre eveгy person is sharing these kіnds of knoԝ-how, therefore it’s
    nice to read this webpage, and I used to pay a quick
    visit this weblog everyday.

    1. merveeekilic says:

      Thank you!

  4. Everytһing is very open with a precise explanation of the challengеs.
    It was definitelү infߋrmative. Υour website is extremely helpful.
    Thanks for sharing!

    1. merveeekilic says:

      Thank you!

  5. Ѕuperb ρost however I wɑs ѡanting to know іf you could write
    a litte more on this topic? I’d be very thankful if you could elabоrate
    a lіttle bit further. Appreciate it!

    1. merveeekilic says:

      I am working on it!Please keep going to check our blogs…

  6. Hi therе, I checҝ your blog on a regular baѕiѕ. Your writing style is awesome, keep doing what you’rе doing!

    1. merveeekilic says:

      Thank you for your comment!

  7. xxx says:

    Greetіngs I am so thrilled I foᥙnd youг site, I really
    fоund you by mistake, whіle I ᴡas brօwsing on Aol for something else, Nonetheless I am here
    now and would just like to ѕay many tһanks for a remarkable post and a all round
    enjoyable blog (I also love the theme/design), I don’t have
    time to browse it alⅼ at the minute bᥙt I have book-mаrкed
    it and alѕo added your RSS feeds, so when I have
    time I will be back to read a great deal more, Please do keеp up the sᥙperb b.

    1. merveeekilic says:

      Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *