Basic Information About Feature Selection

Artificial learning, deep learning and artificial intelligence, which we actively come across in all parts of our lives, is a situation where everyone is working on it, and the predictions are measured with the success score. In business processes, the subject of artificial learning has a critical importance. The data that is in your hands or collected by the company personally and comes to the Feature Engineering phase, is carefully examined from many issues and prepared for the final situation and taken to the person working as a Data Scientist. He can make inferences for the firm by making sense of the data. Thus, if the product or service developed is tested by offering it to the customer and meets the necessary success parameters, we can make the performance of the product sustainable. One of the most important steps here is the scalability of the product produced and the rapid adjustment of the adaptation phase to business processes. Another event is to obtain the significance levels of the features determined by correlation from the data set, to make this meaningful and to determine by the Feature Engineer before the modeling phase. We can think of Feature Engineers as an additional power that accelerates and facilitates the Data Scientist’s business process.



In the case of job search, we may encounter a ‘Feature Engineer’ announcement, which may appear frequently. We can obtain the critical information we learn from the data during the feature selection process during the data preparation phase. Feature selection methods are intended to reduce the number of input variables to those believed to be most useful for a model to predict the target feature. Feature selection processes provide great convenience to employees by reducing the workload as much as possible, if they are determined logically while involved in data pre-processing processes. I mentioned that there is a special business area for this. Feature Selection situations affect the success of the data in modeling and directly affect the success of the values ​​to be predicted. For this reason, the most important part of the events from the first data to the product stage is the right decision of the working person to choose the feature. If the progress is positive, the product will come to life in a short time. Making statistical inferences from the data is as important as determining which data is and how important it is through algorithms. Statistics science should play a role in data science processes in general.



There are also feature selection methods to be determined by statistical filter. We can give examples of scales that differ in their choice of features. Unfortunately, most people working in this field do not care enough about statistical significance. Among some people working on Data Science and Artificial Intelligence, writing code is seen as the basis of this work. I can give examples of categorical and numerical variables for the data set. In addition, these variables are divided into two within themselves. While the feature we see numerically is known as integer and float, variables we see categorically are; known as nominal, ordinal and boolean. You can find this basically in the image I put below. These variables are literally vital to feature selection. In line with the operations performed, these variables can be decided with a statistician during the evaluation phase, and the analysis of the selected features should be made on a solid basis. One of the most necessary features of those working in this field is their ability to interpret and analyze well. In this way, they can easily present the data they prepare in the form of products, with the basics matching the logic.



There is almost no exact method available. Feature selection for each data set is evaluated with a good analysis. Because the operations performed may vary for each feature. That is, while one data set contains too many integers or float values, another data set you are working on may be boolean. Therefore, there may be cases where feature selection methods differ for each data set. The important issue may be to adapt quickly, understand what the data set offers us and produce solutions accordingly. With this method, it is possible for the decisions taken during the transactions to continue in a healthier way. Categorical variables can be determined by methods such as the chi-square test, even this method is more powerful and the rate of efficiency can reach higher points. The choice of features throughout the product or service development stages is the most important step that contributes to the success criteria of a model.