In this blog, I would like to talk about the factor affecting success in convolutional neural networks that I have been working on in my undergraduate thesis.Let’s start with the definition of the loss function

#### Definition of **L**oss Function

The loss function calculates how far the estimated value is from the actual value in the neural network model. The loss function measures the error rate and success of the model so that our model sees it is errors and tries to correct them with the optimization method. We expect the value of this function to approach 0 during training. If we misclassify training data, the loss function’s value would high, if we do a good classification the loss function’s value would low.We usually use gradient descent when finding the minimum point in the loss function.The most commonly used loss functions are L2 loss function and Cross entropy function.

###### L2 Loss Function

L2 Loss Function is used to minimize the error which is the sum of the all the squared differences between the true value and the predicted value.

###### Cross Entropy Function

Cross-entropy is a measure of the difference between two probability distributions for a given random variable or set of events. Cross entropy function will get closer to zero as we approach the target value.

#### Gradient Descent

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model. Parameters refer to coefficients in Linear Regression and weights in neural networks.

The aim is to find the lowest point of the loss value. To find the lowest point we go down in the graph in the direction indicated by the negative gradient. We continue iteratively step by step until we find the minimum point so that it does not move to the bottom of the graph.

##### Learning Rate

The size of these steps is called the learning rate.We determine the size of step which is very small number.

##### Optimization Algorithms

Optimization methods are used to find the optimum value in solving nonlinear problems. Deep learning applications are commonly used in optimization algorithms such as stochastic gradient descent, adagrad, adadelta, adam, adamax. There are differences in performance and speed between these algebras.

As a result, if we want to create a good cnn, we can use the gradient descent method, which is the most effective method to reduce the loss function value to zero.Hope to see you in our next blog…

## References

- https://afteracademy.com/blog/what-are-l1-and-l2-loss-functions
- https://www.linkedin.com/pulse/derin-%C3%B6%C4%9Frenme-uygulamalar%C4%B1nda-temel-kavramlar-skor-ve-%C3%A7arkac%C4%B1/
- https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html
- https://medium.com/deep-learning-turkiye/derin-ogrenme-uygulamalarinda-en-sik-kullanilan-hiper-parametreler-ece8e9125c4
- https://www.udemy.com/course/yapayzeka/learn/lecture/8976370?start=135#questions