Effects of Activation Functions

In Deep Learning, activation functions play a key role and have a direct impact on success rates. The main reason for this is that before reaching success values ​​in the output layers in neural networks, we can reach the change of success value with the change of determined coefficients and weights thanks to the activation function. Generally, the structure of functions varies between linear and nonlinear. This can be found by trying on models for structures such as clustering and regression, or you can access them through the links I have left in the resources section. There are different formulas for each activation function, and we must carefully create the code strings to be installed for them. The formula for neuron operations consists of weights and bias value. Statistical information is one of the most important points in these processes. Even if code writing seems like a critical role, and not considered much, what really matters is knowing what you’re doing. Knowledge of Mathematics and Statistics is an important issue that cannot be ignored. Mathematics and statistics play an important role in all of the Data Science, Deep Learning and Machine Learning processes.



As you can see in the above formula, the additional error parameter known as beta is actually bias. One of the most important structures taught during statistics education is the bias structure. In the neural networks we use and process, bias is an extremely valuable issue and cannot be ignored. The selection of activation functions is very important for the result in the exit and inlet parts effectively on neural networks. The appearance of these functions that contribute to learning varies per input and along the parameters, coefficients. The image I present below includes the situation that is considered as the inputs and outputs of the activation functions. I will leave it as a link at the last part for those who want to access these codes. Success criteria vary within each activation function itself. Softmax is mostly used in the output layer because it makes more sense and success. There are two commonly known names. Examples of these are softmax and sigmoid. Most people embarking on a career journey in this field often hear about these two activation functions. Data scientists working on neural networks are experimenting with ReLU as an initial step.



Activation functions vary according to success parameters along the x and y axes. The main target success rate is to coincide with the peak as data increases along the y-axis. To achieve this, both the parameter values, the coefficient adjustment and the activation function selected during the operations are effective. Back propagation – forward propagation through the coefficients are re-determined and kept at the optimum level, which has an incredible place throughout neural networks. These operations are completely related to mathematics. You should have knowledge of derivatives and if you are working in this field, the important thing is not to write code, but to know exactly what you are doing. You can observe as I left it at the bottom, we are always doing derivative operations for backward rotation. There are neural networks built on a mathematical background. After these processes, we can observe the success of the activation functions by finding the most suitable one in the exit section. Thus, we can easily find the optimum function for the model and see its usage cases on a project basis. As a result of these situations, success situations vary.



In the last part, I will explain the activation functions and briefly talk about what they do. Step Function, Linear Function, Sigmoid Function, Hyperbolic Tangent Function, ReLU Function, Leaky ReLU Function, Swish, Softmax Function can be given as examples for activation functions.

Step Function: Makes binary classification with threshold value.

Linear Function: It produces several activation values ​​but its derivative is constant.

Sigmoid Function: It is a function known by almost everyone and gives output in the range of [0,1].

Hyperbolic Tangent Function: It is a nonlinear function that outputs in the range of [-1,1].

ReLU Function: It is essentially a nonlinear function. The property of the ReLU function is that it takes value 0 for negative inputs and positive values ​​forever. [0, + ∞)

Leaky ReLU Function: Leaky ReLU distinctive feature is that it has transitioned with axes close to 0 but touches on the origin to 0 a and keeps the lost gradients in ReLU with the negative region.

Swish Function: This function produces the product of the inputs and the sigmoid function as an output.

Softmax Function: This function, which is used for multiple classification problems, produces outputs between [0,1] that show the probability that each given input belongs to a subclass.


I wrote the parts that I made use of the images I took and the definitions in the references section. If you liked my article, I would appreciate it if you could give feedback.


References :