Application of CNN

Hello everyone, in my last blog post, I wanted to discuss a simple application about my favorite topic, CNN. I chose the mnist data set, which is one of the easiest data sets for this topic.

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST’s training dataset, while the other half of the training set and the other half of the test set were taken from NIST’s testing dataset. The images contained within have a width of 28 pixels and a height of 28 pixels.

Figure : Sample images from MNIST test dataset

Application

Data set is imported from tensorflow library. Session function has used to running codes. Global_variables_initializer has activated for codes to work. Data should be given piece by piece to train the model so batch_size has taken as 128. A function called “training step” has been created for the realization of the training. The for loop has defined as the loop that will perform the training in the function. MNIST pictures have taken with this code x_batch, y_batch = mnist. train. next_batch(batch_size) so we have feed pictures to our model in the form of batch. Feed_dict_train has defined to assign images and tags in the data set to our place holders. The code has written in one line to simultaneously optimize the model and see the variability of the loss value. The if loop has been used to observe the situation in our training. It is coded for training accuracy and training loss printing every 100 iterations. The test_accuracy function has been defined to see how our model predicts data that it has not encountered before.

2 convolutional layers have used to implement the MNIST data set. As a result of trials, when the number of convolutional layers, training step and filter sizes have increased, it has seen that the accuracy increased.First convolutional layer has 16 filters and they all have 5×5 size filters. Second convolutional layer has 32 filters and they all have 5×5 size filters. Layers have combined by making necessary arrangements with max pooling function. ReLU and SoftMax functions have used as activation function. Adam has been used as an optimization algorithm. A very small value of 0.0005 was taken as the learning rate. Batch size is set to 128 for make the training better. Training accuracy and training loss have printed on the output every 100 iterations to check the accuracy of the model. Test accuracy 0.9922 has obtained because of 10000 iterations when the codes have executed.

Figure : Estimation mistakes made by the model

In the figure above, some examples that our model incorrectly predicted are given. Our model can sometimes make wrong predictions, which may be because the text is faint or unclear. In the first example, we see that our model estimates the number 4 as 2.

Figure :  Graph of Loss function

The loss graph gives us a visualized version of the loss values we observed during the training. As shown in the figure, we have a decreasing loss graph over time. Our goal is bringing the loss value closer to zero. Through the loss graph, we can see the appropriateness of the learning rate. When we look at the figure, we can say that our learning rate value is good because there is no slowdown in the decrease in the graph.

In this blog, I made an application on Python using CNN with the Mnist data set. Thank you to everyone who has followed my blogs closely until today, goodbye until we see you again …

 

REFERENCES

  1. https://en.wikipedia.org/wiki/MNIST_database
  2. https://www.udemy.com/course/yapayzeka/learn/lecture/8976370?start=135#que

Python Data Science Libraries 1 – Pandas Methodology

I am putting the topics I have been working on into a series that I will tell you one by one. For this reason, I will explain the methodology and usage aspects of almost all libraries that I am actively working on. I’ll start with pandas, which allows functional operations such as data preprocessing without reading data first. With this library, we can easily perform data pre-processing steps, which are vital steps for data science, such as observing missing data and extracting that data from the data set. In addition, you can bypass data types and the front part of numerical or categorical operations that you will do on them. This provides us with significant convenience before proceeding. Each library in Python has its own specialties, but speaking for pandas, it is responsible for all of the pre-part modifications to the data to form the basis of the data science steps. Data classification processes in Pandas can be designed and activated quickly with a few functional codes. This is the most critical point in the data preprocessing stage, in the previous steps of data modeling.

 

 

We can store the data as “dataframe” or “series” and perform operations on it. The fact that Pandas library performs every operation on data in a functional, easy and fast way reduces the workload in data science processes on behalf of data scientists. In this way, it can handle steps such as the beginning and most difficult part of the process, such as data preprocessing, and focus on the last steps of the job. By reading data such as .csv, .xlsx, .json, .txt prepared in different types, it takes the data that has been entered or collected through data mining into python to process. Pandas library, which has the dataframe method, is more logical and even sustainable than other libraries in terms of making the data more functional and scalable. Those who will work in this field should work on the methodology of pandas library, which has the basic and robust structure of the python programming language, not to write code directly. Because new assignments on the data, column names, grouping variables, removing empty observations from the data or filling empty observations in a specific way (mean, 0 or median assignment) can be performed.

 

 

Data cannot be processed or analyzed before the Pandas library is known. To be clear, the pandas library can be called the heart of data science. Specially designed functions such as apply (), drop (), iloc (), dtypes () and sort_values ​​() are the most important features that make this library exclusive. It is an indispensable library for these operations, even if it is not based here on the basis of its original starting point. In the steps to be taken, it has a structure with tremendous features and a more basic case in terms of syntax. It is possible to host the results from the loops in clusters and convert them into dataframe or series. The acceleration of the processes provides a great advantage in functional terms if the project that will emerge has a progressing process depending on time, which is generally the case. Looking at its other possibilities, it is one of the most efficient libraries among the python libraries. The fact that it is suitable for use in many areas can be considered as a great additional feature. Pandas is among the top 3 libraries in the voting among data processing libraries made by software developers using the python programming language. You can reach this situation, which I quoted with datarequest in the sources section.

 

 

The concept of “data science”, which has been developing since 2015, has brought the pandas library to the forefront and this library, which has been developing in silence for years, has come to light. After Pandas, I will explain numpy and talk about numerical and matrix operations. In general, Pandas is a library that has high-level features in basic data analysis and data processing. In addition, if you specify the topics you will talk about and the things you want me to mention, I will draw a more solid way in terms of efficiency. I hope these articles that I will publish in series will help people who will work in this field. In the future, I will add the cheatsheet style contents that I will prepare on github to the bibliography section. If you want to take advantage of such notes, I will put my github account in the resource section, and you can easily access there.

 

 

References:

https://www.geeksforgeeks.org/python-pandas-dataframe/

https://medium.com/deep-learning-turkiye/adan-z-ye-pandas-tutoriali-ba%C5%9Flang%C4%B1%C3%A7-ve-orta-seviye-4edf0094e0d5#:~:text=Pandas%2C%20Python%20programlama%20dili%20i%C3%A7in,sonuca%20kolayca%20ula%C5%9Fmak%20i%C3%A7in%20kullan%C4%B1lmaktad%C4%B1r.

https://www.dataquest.io/blog/15-python-libraries-for-data-science/

https://github.com/tanersekmen/

https://www.edureka.co/blog/python-pandas-tutorial/

https://globalaihub.com/importance-of-data-quality-and-data-processing/

https://globalaihub.com/hareketli-ortalama-algoritmasiyla-al-sat-tavsiyeleri/

https://www.dataquest.io/course/pandas-fundamentals/Python Data Science Libraries 1 – Pandas Methodology