## Support Vector Machines Part 1

Hello everyone. Image classification are among the most common usage area of artificial intelligence. There are many ways to classify images, but I want to talk about support vector machines in this blog.

In machine learning, support-vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.Since the algorithm in question does not require any joint distribution function information regarding the data, they are distribution independent learning algorithms.Support Vector Machine (SVM) can be used for both classification and regression challenges. However, it is mostly used for classification problems.

How to solve the classification problem with SVM?

In this algorithm, we draw each data item as a point in n-dimensional space. Next, we classify by finding the hyperplane that separates the two classes very well. The algorithm is set in two classes of the line to be drawn in such a way that it passes from the furthest place to its elements. It is a nonparametric classifier. SVM can also classify linear and nonlinear data, but generally tries to classify data linearly.

SVMs apply a classification strategy that uses a margin-based geometric criterion instead of a pure statistical criterion. In other words, SVMs do not need statistical distribution estimates of classes in order to move from the classification task, and they define the classification model using the concept of margin maximization.

In SVM literature, the predictor is called a variable symbol, and a transformed symbol used to describe the hyperplane is called a feature. The task of choosing the most appropriate representation is also known as feature selection. A set of properties that describe a case is called a vector.

Thus, the purpose of SVM modeling; The goal is to find the optimal hyperplane separating the vector sets, with the single-category states of the variable on one side of the plane and the other categorized states on the other side of the plane.

Classification with SVM

The mathematical algorithms owned by the SVM were originally designed for the classification problem of two-class linear data, then generalized for classification of multi-class and non-linear data. The working principle of DVM is based on the prediction of the most appropriate decision function that can distinguish the two classes, in other words, the definition of the hyper-plane that can distinguish the two classes from each other in the most appropriate way (Vapnik, 1995; Vapnik, 2000). In recent years, intensive studies have been carried out on the use of DVMs in the field of remote sensing, which are used successfully in many areas. (Foody et al., 2004; Melgani et al., 2004; Pal et al., 2005; Kavzoglu et al., 2009). In order to determine the optimum hyperplane, two hyperplanes parallel to this plane and its boundaries must be determined. The points that make up these hyperplanes are called support vectors.

How to Identify the Correct Hyper Plane?

It is quite easy to detect the correct hyperplane with package programs such as R, Python, but we can also detect the correct hyperplane manually with simple methods. Let’s consider a few simple examples. Here we have 3 different hyperplanes a, b and c. Now let’s define the correct hyperplane to classify the star and the circle. Hyperplane b is chosen because it correctly separates stars and circles in this graph.

If all of our hyperplanes separate classes well, how can we detect the correct hyperplane? Here, maximizing the distances between the nearest data point (class) or hyperplane will help us decide on the correct hyperplane. This distance is called the Margin. We can see that the hyperplane C margin is high compared to both A and B. Hence, we call the straight plane C.

SVM for linearly inseparable data

In many problems, such as the classification of satellite images, it is not possible to separate the data linearly. In this case, the problem arising from the fact that some of the training data remains on the other side of the optimum hyperplane is solved by defining a positive dummy variable. The balance between maximizing the limit and minimizing false classification errors can be controlled by defining a regulation parameter (0 <C <∞) that takes positive values and is denoted by C (Cortes et al., 1995). Thus, data can be separated linearly and hyper-plane between classes can be determined. Support vector machines can mathematically make nonlinear transformations with the help of a kernel function, thus allowing the data to be separated linearly in high dimensions.

It is essential to determine the kernel function to be used for a classification process to be performed with support vector machines (SVM) and the optimum parameters of this function. The most commonly used kernel functions in the literature are polynomial, radial based function, PUK function and normalized polynomial kernels.

SVM is used for things like disease recognition in medicine, limitation of consumer loans in banking, and face recognition in artificial intelligence. In the next blog, I will try to talk about their applications on package programs. Goodbye until we meet again …

REFERENCES