# Activation Functions

We often talk about activation functions when using artificial neural networks. Let’s all consider the definition and varieties of the activation function together.

ACTIVATION FUNCTIONS

Activation functions obstruct neural networks to be a linear transformation. Without activation functions, a neural network acts as a linear connecting with limited learning power. When we give complex world information such as image, sound, video to learned by neural network, the network is forced to learn. So, we need nonlinear functions which has multiple degrees. Activation functions can regulate the outputs of nodes and add a level of complexity that neural networks without activation functions cannot achieve. Thus, although the complexity of the network, the network becomes stronger and learns better.

1.Sigmoid Function

The sigmoid function compresses the values it receives from 0 to 1. Here is the mathematical expression for sigmoid function.

f(x)=1/1+e^(-x)

When a high value comes, it gets closer to one and produces a stronger signal and when a negative value arrives, it approaches zero and produces a weaker signal.

The sigmoid function is not linear so the network becomes more complex and we can use it for more difficult tasks but if we look carefully at the graph, we can see that y values react very little to changes in X. In these regions the derivative values become very small and approach to 0. This is called vanishing gradient, and the learning event takes place at a minimum level. When a slow learning event occurs, the optimization algorithm that minimizes the error can be attached to local minimums and we cannot get the maximum performance that can be obtained from the artificial neural network model.

2.Tanh Function

The tanh function compresses the values it receives from -1 to 1. Here is the mathematical expression for tanh function.

f(x) =  2/1+e^(-2x) -1

The derivative of tanh function is steeper than the sigmoid function’s derivative, so it can take more value. It means that it will be more efficient because it has a wider range for the classification process. However, the problem of vanishing gradient at the ends of the function continues.

3.ReLU Function

ReLU is commonly used in deep learning neural networks for speech recognition and computer vision. This function first separates the incoming values according to whether they are positive or negative. The output is 0 if the input is negative and return the input unchanged if the input is positive so computer can calculate faster. The problem with the ReLU function is that the derivative of this zero-value region, which gives us processing speed, is also zero, because therefore learning cannot occur there.

f(x)=max (0, x)

4. Leaky ReLU

Leaky ReLU function developed against the dead neuron problem in ReLU function.

f(x)=max (0.01x, x)

As shown in the figure, this problem was solved by a 0.01 magnitude leak towards the bottom of the X axis. This value is close to 0, but not 0 because of the vanishing gradients in ReLU survived, so learning is also provided for the values in the negative region.

5. Swish Function

Swish function gets value in negative region like leaky ReLU function, but swish function’s values are not linear.

f(x)= x × 1/1+e^(-x)

Thus, we have seen that activation functions play a key role in artificial neural networks. Hope to see you in our next article…

References

## 14 thoughts on “Activation Functions”

1. south says:

Ι do accept as true wіth аll the concepts you’ve
offered to your рost. They are really convincing and will
certainly work. Still, the posts are too brief for newbies.
Could you please prolong them a little from subsequent time?
Thanks for the post.

1. merveeekilic says:

2. Awesome! Its actսally amazing paｒagraph, I have got much clear idｅa
rеgarding from this pɑragraph.

1. merveeekilic says:

Thank you!

3. Wһat’s up everybody, hеre eveгy pｅrson is sharing these kіnds of knoԝ-how, therefore it’s
nice to rｅad this webpage, and I used to pay a quick
visit this weblog everyday.

1. merveeekilic says:

Thank you!

4. Everytһing is very open with a precise explanation of the challengеs.
It was definitelү infߋrmative. Υour website is extremely helpful.
Thanks for sharing!

1. merveeekilic says:

Thank you!

5. Ѕuperb ρost however I wɑs ѡanting to know іf you could write
a litte more on this topic? I’d be very thankful if you could elabоrate
a lіttlｅ bit further. Appreciate it!

1. merveeekilic says:

I am working on it!Please keep going to check our blogs…

6. Hi thｅrе, I checҝ your blog on a regular baѕiѕ. Your writing style is awesome, keep doing what you’rе doing!

1. merveeekilic says:

7. xxx says:

Greetіngs I am so thrilled I foᥙnd youг site, I really
fоund you by mistake, whіle I ᴡas brօwsing on Aol for something else, Nonetheless I am here
now and would just like to ѕay many tһanks for a remarkable post and a all round
enjoyable blog (I also love the theme/design), I don’t have
time to browse it alⅼ at the minute bᥙt I have book-mаrкed
time I will be back to read a great deal more, Please do keеp up the sᥙperb b.

1. merveeekilic says:

Thank you!

## Benzetimli Tavlama (Simulated Annealing) Algoritması

BENZETİMLİ TAVLAMA ALGORİTMASI Herkese merhabalar, optimize kelimesi aslında günlük hayatta...

## Python Data Science Libraries 2 – Numpy Methodology

One of the most important and fundamental libraries in Python...