Application of CNN

Hello everyone, in my last blog post, I wanted to discuss a simple application about my favorite topic, CNN. I chose the mnist data set, which is one of the easiest data sets for this topic.

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST’s training dataset, while the other half of the training set and the other half of the test set were taken from NIST’s testing dataset. The images contained within have a width of 28 pixels and a height of 28 pixels.

Figure : Sample images from MNIST test dataset


Data set is imported from tensorflow library. Session function has used to running codes. Global_variables_initializer has activated for codes to work. Data should be given piece by piece to train the model so batch_size has taken as 128. A function called “training step” has been created for the realization of the training. The for loop has defined as the loop that will perform the training in the function. MNIST pictures have taken with this code x_batch, y_batch = mnist. train. next_batch(batch_size) so we have feed pictures to our model in the form of batch. Feed_dict_train has defined to assign images and tags in the data set to our place holders. The code has written in one line to simultaneously optimize the model and see the variability of the loss value. The if loop has been used to observe the situation in our training. It is coded for training accuracy and training loss printing every 100 iterations. The test_accuracy function has been defined to see how our model predicts data that it has not encountered before.

2 convolutional layers have used to implement the MNIST data set. As a result of trials, when the number of convolutional layers, training step and filter sizes have increased, it has seen that the accuracy increased.First convolutional layer has 16 filters and they all have 5×5 size filters. Second convolutional layer has 32 filters and they all have 5×5 size filters. Layers have combined by making necessary arrangements with max pooling function. ReLU and SoftMax functions have used as activation function. Adam has been used as an optimization algorithm. A very small value of 0.0005 was taken as the learning rate. Batch size is set to 128 for make the training better. Training accuracy and training loss have printed on the output every 100 iterations to check the accuracy of the model. Test accuracy 0.9922 has obtained because of 10000 iterations when the codes have executed.

Figure : Estimation mistakes made by the model

In the figure above, some examples that our model incorrectly predicted are given. Our model can sometimes make wrong predictions, which may be because the text is faint or unclear. In the first example, we see that our model estimates the number 4 as 2.

Figure :  Graph of Loss function

The loss graph gives us a visualized version of the loss values we observed during the training. As shown in the figure, we have a decreasing loss graph over time. Our goal is bringing the loss value closer to zero. Through the loss graph, we can see the appropriateness of the learning rate. When we look at the figure, we can say that our learning rate value is good because there is no slowdown in the decrease in the graph.

In this blog, I made an application on Python using CNN with the Mnist data set. Thank you to everyone who has followed my blogs closely until today, goodbye until we see you again …




Support Vector Machines Part 1

Hello everyone. Image classification are among the most common usage area of artificial intelligence. There are many ways to classify images, but I want to talk about support vector machines in this blog.

In machine learning, support-vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.Since the algorithm in question does not require any joint distribution function information regarding the data, they are distribution independent learning algorithms.Support Vector Machine (SVM) can be used for both classification and regression challenges. However, it is mostly used for classification problems.

How to solve the classification problem with SVM?

In this algorithm, we draw each data item as a point in n-dimensional space. Next, we classify by finding the hyperplane that separates the two classes very well. The algorithm is set in two classes of the line to be drawn in such a way that it passes from the furthest place to its elements. It is a nonparametric classifier. SVM can also classify linear and nonlinear data, but generally tries to classify data linearly.

SVMs apply a classification strategy that uses a margin-based geometric criterion instead of a pure statistical criterion. In other words, SVMs do not need statistical distribution estimates of classes in order to move from the classification task, and they define the classification model using the concept of margin maximization.

In SVM literature, the predictor is called a variable symbol, and a transformed symbol used to describe the hyperplane is called a feature. The task of choosing the most appropriate representation is also known as feature selection. A set of properties that describe a case is called a vector.

Thus, the purpose of SVM modeling; The goal is to find the optimal hyperplane separating the vector sets, with the single-category states of the variable on one side of the plane and the other categorized states on the other side of the plane.

Classification with SVM

The mathematical algorithms owned by the SVM were originally designed for the classification problem of two-class linear data, then generalized for classification of multi-class and non-linear data. The working principle of DVM is based on the prediction of the most appropriate decision function that can distinguish the two classes, in other words, the definition of the hyper-plane that can distinguish the two classes from each other in the most appropriate way (Vapnik, 1995; Vapnik, 2000). In recent years, intensive studies have been carried out on the use of DVMs in the field of remote sensing, which are used successfully in many areas. (Foody et al., 2004; Melgani et al., 2004; Pal et al., 2005; Kavzoglu et al., 2009). In order to determine the optimum hyperplane, two hyperplanes parallel to this plane and its boundaries must be determined. The points that make up these hyperplanes are called support vectors.

How to Identify the Correct Hyper Plane?

It is quite easy to detect the correct hyperplane with package programs such as R, Python, but we can also detect the correct hyperplane manually with simple methods. Let’s consider a few simple examples.

Here we have 3 different hyperplanes a, b and c. Now let’s define the correct hyperplane to classify the star and the circle. Hyperplane b is chosen because it correctly separates stars and circles in this graph.

If all of our hyperplanes separate classes well, how can we detect the correct hyperplane?

Here, maximizing the distances between the nearest data point (class) or hyperplane will help us decide on the correct hyperplane. This distance is called the Margin.

We can see that the hyperplane C margin is high compared to both A and B. Hence, we call the straight plane C.

SVM for linearly inseparable data

In many problems, such as the classification of satellite images, it is not possible to separate the data linearly. In this case, the problem arising from the fact that some of the training data remains on the other side of the optimum hyperplane is solved by defining a positive dummy variable. The balance between maximizing the limit and minimizing false classification errors can be controlled by defining a regulation parameter (0 <C <∞) that takes positive values and is denoted by C (Cortes et al., 1995). Thus, data can be separated linearly and hyper-plane between classes can be determined. Support vector machines can mathematically make nonlinear transformations with the help of a kernel function, thus allowing the data to be separated linearly in high dimensions.

It is essential to determine the kernel function to be used for a classification process to be performed with support vector machines (SVM) and the optimum parameters of this function. The most commonly used kernel functions in the literature are polynomial, radial based function, PUK function and normalized polynomial kernels.

SVM is used for things like disease recognition in medicine, limitation of consumer loans in banking, and face recognition in artificial intelligence. In the next blog, I will try to talk about their applications on package programs. Goodbye until we meet again …





Hello everybody, in this blog i want to talk about one of the free and most used open source deep learning library called TensorFlow. So why do we call it as open source? Open source allows the user to view and edit the codes of the software and to inform the user about program development. So you can easily create models with tensorflow, access machine learning pipeline with TensorFlow Extended (TFX), and train and deploy models in JavaScript environments with TensorFlow.js. You can also create complex topologies with features such as Functional API and Model Subclassing API.

What is TensorFlow?

TensorFlow was developed by Google Brain team initially to conduct machine learning and deep neural networks research and in 2015 TensorFlow codes were made available to everyone.TensorFlow is a library used for numerical computation using data flow charts in mathematics and if the literal meaning of tensor is a geometric object in which multidimensional data can be symbolized.

As you see above, tensors are multidimensional arrays that allow you to represent only higher dimensional datas. In deep learning, we deal with high-dimensional data sets where dimensions refer to different properties found in the data set.

Usage examples of TensorFlow

1)TensorFlow can be used efficiently in sound base applications with Artificial Neural Networks. These are; Voice recognition, Voice search, Emotion analysis and Flaw detection.

2) Further popular uses of TensorFlow are, text based applications such as sentimental analysis (CRM, Social Media), Threat Detection (Social Media, Government) and Fraud Detection (Insurance, Finance).As an example PayPal use TensorFlow for fraud detection.

3) It can also be used in Face Recognition, Image Search, Image Classification, Motion Detection, Machine Vision and Photo Clustering, Automotive, Aviation and Healthcare Industries.As an example Airbnb uses TensorFlow to categorize images and improve guest experience.

4) TensorFlow Time Series algorithms are used for analyzing time series data in order to extract meaningful statistics. As an example Naver automatically classifies shopping product categories with tensorflow

5) TensorFlow neural networks also work on video data. This is mainly used in Motion Detection, Real-Time Thread Detection in Gaming, Security, Airports and UX/UI fields.As an example Airbus uses tensorflow to extract information from satellite imagery and provide insights to customers.

Where can i learn TensorFlow?

You can join course “Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning” on Coursera and “Intro to TensorFlow for Deep Learning” on Udacity for free.Tutorials for beginners and experts are available on TensorFlow’s official site.   You can find Mnist data set and other “Hello World” examples that I also have applied before.

As a result, we talked about the meaning of the word tensorflow, what tensorflow is, the usage areas of tensorflow and how we can learn. As it can be understood from the blog, world-leading companies prefer tensorflow for many things such as image classification, voice recognition, disease detection. Step into this magical world without wasting time! Hope to see you in our next blog…





R Programlama

Yapay zeka ve makine öğrenmesi denilince akla ilk gelen yazılım dillerini Java, C, Python olarak sıralayabiliriz. Bir istatistikçi olarak benim de kullandığım, veri bilimciler tarafından da sıklıkla tercih edilen R Programlama, istatistiksel veri analizi, grafik gösterimi, istatistiki yazılım geliştirme alanlarında kullanılan bir programlama ve yazılım dilidir.
R, doğrusal ve doğrusal olmayan modelleme, klasik istatistiki testler, zaman serileri analizi, sınıflandırma, kümeleme gibi istatistiki teknikler ve grafik çizim teknikleri sunmaktadır. R;

  • Etkili bir veri işleme ve depolama tesisidir.
  • Diziler, özellikle matrisler üzerinde hesaplamalar için bir operatör paketi içerir.
  • Veri analizi için geniş, tutarlı, entegre bir ara araç koleksiyonu içerir.
  • Veri analizi için grafiksel olanaklar ve ekranda veya basılı kopya üzerinde görüntüleme ve koşullu ifadeler, döngüler, kullanıcı tanımlı özyinelemeli işlevler ve girdi ve çıktı olanaklarını içeren iyi geliştirilmiş, basit ve etkili bir programlama dilidir.

R Programın Tarihçesi
Yeni Zelanda Auckland Üniversitesinden Ross Ihaka ve Robert Gentleman tarafından ortaya çıkarılan R, günümüzde de  R Geliştirme Çekirdek Ekibi tarafından geliştirilmektedir. S programlama dilinin  uyarlaması olarak karşımıza çıkar. R Foundation tarafından desteklenen ve GNU Tasarısının parçası olan bir özgür yazılımdır.
R Programlamanın Avantajları

  1. R, gerektiği yerlerde matematiksel semboller ve formüller dahil olmak üzere iyi tasarlanmış yayın kalitesinde grafiklerin üretilebilmesinde kolaylık sağlar.
  2. Açık kaynak kodlu ve ücretsizdir. Veri madenciliği, istatistik gibi konularda 15.000’in üzerinde paket içerir. Aynı zamanda kullanıcıların kendi paketlerini oluşturmalarında veya çok özel araştırma alanlarına ait paketlerle oldukça geliştirilebilirdir.
  3. Çapraz platform olması sayesinde GNU/Linux, Microsoft Windows gibi değişik işletim sistemleri üzerinde çalışabilir.
  4. Microsoft Excel, Microsoft Access, Oracle, MySQL ve SQLite, Hadoop, SAS ve SPSS gibi birçok araç ile entegre şekilde çalışabilir bu sayede data import- export işlemleri kolaylıkla gerçekleştirebilir.
  5. Verinin ekranda ya da basılı bir eserde görüntülenebilmesine olanak veren geniş, grafiksel özellikler sunar.

R programlamanın kullanım alanlarına bakacak olursak; dünya çapında birçok veri bilimci tarafından sağlık, finans, otomotiv gibi alanlarda kullanılır. Örnek olarak Ford Motor Company, iş stratejisini ve gelecekteki tasarımlarını geliştirmelerine yardımcı olan ürün hakkındaki müşteri düşüncelerini analiz etmek için R’ı istatistiksel analizler için kullanır.
R yapay zeka mühendisleri ve veri bilimciler için en iyi programlama dilleri arasında gösterilir. Tahmin, kestirim, sınıflandırma gibi yaklaşımları ve makine öğrenmesi için gerekli algoritmalarını içeren kütüphaneleri(dplyr, magrittr, caTools, caret gibi) bünyesinde barındırır. R programlamadan kısaca bahsettik. Sizler de veri bilimci olma yolunda ilerliyorsanız R programlama dilini kısa zaman içinde öğrenebilir ve makine öğrenmesi problemlerinizde uygulayabilirsiniz. Bir sonraki yazımızda görüşünceye dek hoşçakalınız…


Data Mining and Being a Data Miner

Hello everyone, as a statistician, I can say that most statisticians dream of becoming a data miner but the road to be followed for this is long and bumpy. According to Google Trends data, “Data mining” and “Data Miner” searches in Google Web Search are very popular around the world. So what makes data mining so attractive?
Currently, the multiplicity of data and the difficulty of using the information required after processing data has increased the need for data mining.
Data mining is an automatic or semi-automated technical process used to analyze and interpret large amounts of dispersed information and turn it into information. Data mining is frequently used in marketing, retail, banking, healthcare, and e-commerce application areas.
Stages of Data Mining

We can basically consider the data mining process is:

  1. Obtain and secure the data stack
  2. Smoothing
  3. Damy-Optimization
  4. Data Reduction
  5. Normalization
  6. Applying Related Data Mining Algorithms
  7. Testing and training results in related software languages (R, Python, Java)
  8. Evaluation and presentation of results

To become a data miner requires programming, mathematics, statistics, machine learning, and some personal skills. Let’s examine these requirements in a little more detail together.

  • Algorithmic approach
  • Programming logic
  • Big data technologies(Spark, Hive, Impala, DBS, etc.)
  • SQL(databases), NoSQL, Bash Script, R, Python, Scala, SPSS, SAS, MATLAB, etc.
  • Cloud technologies (AWS, Google Cloud, Microsoft Azure, IBM, etc.)

2)Statistical Learning (SL):

  • Tidy data process and data preprocessing
  • Regression Models
  • Linearity and causality
  • Inference Statistics
  • Multivariate Statistical Methods

3)Machine Learning(ML)

  • Classification
  • Clustering
  • Association Rule Learning
  • Text Mining, NLP
  • Reinforcement Learning
  • Deep Learning

4)Personal Skills

  • Being Able To Ask The Right Questions
  • Analytical Perspective
  • Problem Solving Ability
  • Storytelling and presentation ability

As a result, we talked briefly about the definition, stages, and requirements of data mining in this blog. Hope to see you in our next blog.