Hate Speech and AI: Issues in Detection

Hate speech is a form of expression which attacks someone mostly based on their race, gender, ethnicity and sexual orientation. The history of hate speech dates back long time ago; however, with the expansion of the internet and social media, it had its most accelerated form. Now, 41% of the American population have experienced a form of online harassment as Pew Research Center’s report suggests. Also, the high correlation between suicide rates and verbal harrasment in migrant groups shows the crucial importance of detecting and preventing the spread of hate speech. Additonally as an instance from recent years, after the mass murder that happened in Pittsburg synagoge it has seen that the murderer was posting hated messages to jews constantly before the incident.



Retrieved from: https://www.kqed.org/news/11702239/why-its-so-hard-to-scrub-hate-speech-off-social-media


Furthermore, the Pew Research Center’s report also suggests that 79% of the American population thinks that the detection of hate speech/online harassment is in the responsibility of online service providers. Hence, many online service providers are aware of the importance of the issue and have close relationships with AI engineers while solving it.

When it comes to the logic of hate speech detection, there are many complex points. Firstly, such complexity comes from the current AI technologies’ limitations on understanding the contexts of human language. For instance, current technologies fail to detect hate speech or give false positives when there are contextual differences. As such, researchers from Carnegie Mellon University suggested that the toxicity of the speech may differ with the race, gender and ethnic characteristics of the people. Hence, to increase the quality of the data and detection; it is important to identify the characteristics of the author while identifying the hate speech and its toxicity rate according to the researchers. Also, such identification can also reduce the current bias the algorithms have.

Retrieved from: https://www.pewresearch.org/internet/2017/07/11/online-harassment-2017/pi_2017-07-11_online-harassment_0-01/


However, current AI technologies have difficulties in detecting such characteristics. Firstly, it’s difficult to identify the demographics and characteristics of the authors’; since in most of the cases such information is not available on the internet. So, the process of distinguishing hate speech becomes harder. Secondly, even if the author clearly indicates such information; sometimes the detection process becomes more difficult due to the cultural insights of the given context. The dynamics of the countries or even the regions in countries is changeable and is really related to their culture and language. Such differences and ongoing changing factors are also crucial points for the outcomes of the processes; some outcomes may fail to detect or detect false positives due to non-statistical cultural differences.



Language is one of the most complicated and most significant functions of the humankind. There are many different ways and contexts of communicating with language which even neuroscientists could not fully map yet. However, with artificial intelligence scientists are also one step forward in describing the patterns and mechanisms of language. In such sense, the crucially important subject in the age of the internet, hate speech detection, also has an advantage since it is much easier to detect online harassment with machine learning algorithms. Nevertheless, there is no way for humans to get out of the detection cycle in today’s technology with the issues faced in detection processes. 







Support Vector Machines Part 1

Hello everyone. Image classification are among the most common usage area of artificial intelligence. There are many ways to classify images, but I want to talk about support vector machines in this blog.

In machine learning, support-vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.Since the algorithm in question does not require any joint distribution function information regarding the data, they are distribution independent learning algorithms.Support Vector Machine (SVM) can be used for both classification and regression challenges. However, it is mostly used for classification problems.

How to solve the classification problem with SVM?

In this algorithm, we draw each data item as a point in n-dimensional space. Next, we classify by finding the hyperplane that separates the two classes very well. The algorithm is set in two classes of the line to be drawn in such a way that it passes from the furthest place to its elements. It is a nonparametric classifier. SVM can also classify linear and nonlinear data, but generally tries to classify data linearly.

SVMs apply a classification strategy that uses a margin-based geometric criterion instead of a pure statistical criterion. In other words, SVMs do not need statistical distribution estimates of classes in order to move from the classification task, and they define the classification model using the concept of margin maximization.

In SVM literature, the predictor is called a variable symbol, and a transformed symbol used to describe the hyperplane is called a feature. The task of choosing the most appropriate representation is also known as feature selection. A set of properties that describe a case is called a vector.

Thus, the purpose of SVM modeling; The goal is to find the optimal hyperplane separating the vector sets, with the single-category states of the variable on one side of the plane and the other categorized states on the other side of the plane.

Classification with SVM

The mathematical algorithms owned by the SVM were originally designed for the classification problem of two-class linear data, then generalized for classification of multi-class and non-linear data. The working principle of DVM is based on the prediction of the most appropriate decision function that can distinguish the two classes, in other words, the definition of the hyper-plane that can distinguish the two classes from each other in the most appropriate way (Vapnik, 1995; Vapnik, 2000). In recent years, intensive studies have been carried out on the use of DVMs in the field of remote sensing, which are used successfully in many areas. (Foody et al., 2004; Melgani et al., 2004; Pal et al., 2005; Kavzoglu et al., 2009). In order to determine the optimum hyperplane, two hyperplanes parallel to this plane and its boundaries must be determined. The points that make up these hyperplanes are called support vectors.

How to Identify the Correct Hyper Plane?

It is quite easy to detect the correct hyperplane with package programs such as R, Python, but we can also detect the correct hyperplane manually with simple methods. Let’s consider a few simple examples.

Here we have 3 different hyperplanes a, b and c. Now let’s define the correct hyperplane to classify the star and the circle. Hyperplane b is chosen because it correctly separates stars and circles in this graph.

If all of our hyperplanes separate classes well, how can we detect the correct hyperplane?

Here, maximizing the distances between the nearest data point (class) or hyperplane will help us decide on the correct hyperplane. This distance is called the Margin.

We can see that the hyperplane C margin is high compared to both A and B. Hence, we call the straight plane C.

SVM for linearly inseparable data

In many problems, such as the classification of satellite images, it is not possible to separate the data linearly. In this case, the problem arising from the fact that some of the training data remains on the other side of the optimum hyperplane is solved by defining a positive dummy variable. The balance between maximizing the limit and minimizing false classification errors can be controlled by defining a regulation parameter (0 <C <∞) that takes positive values and is denoted by C (Cortes et al., 1995). Thus, data can be separated linearly and hyper-plane between classes can be determined. Support vector machines can mathematically make nonlinear transformations with the help of a kernel function, thus allowing the data to be separated linearly in high dimensions.

It is essential to determine the kernel function to be used for a classification process to be performed with support vector machines (SVM) and the optimum parameters of this function. The most commonly used kernel functions in the literature are polynomial, radial based function, PUK function and normalized polynomial kernels.

SVM is used for things like disease recognition in medicine, limitation of consumer loans in banking, and face recognition in artificial intelligence. In the next blog, I will try to talk about their applications on package programs. Goodbye until we meet again …


  1. https://dergipark.org.tr/en/download/article-file/65371
  2. https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/
  3. http://nek.istanbul.edu.tr:4444/ekos/TEZ/43447.pdf
  4. https://www.harita.gov.tr/images/dergi/makaleler/144_7.pdf
  5. https://www.slideshare.net/oguzhantas/destek-vektr-makineleri-support-vector-machine
  6. https://tez.yok.gov.tr/UlusalTezMerkezi/tezSorguSonucYeni.jsp#top2
  7. https://medium.com/@k.ulgen90/makine-%C3%B6%C4%9Frenimi-b%C3%B6l%C3%BCm-4-destek-vekt%C3%B6r-makineleri-2f8010824054
  8. https://www.kdnuggets.com/2016/07/support-vector-machines-simple-explanation.html


Featured Image for Keras

A Quick Start to Keras and TensorFlow

Keras is a deep learning library designed in the Python language. If you have worked on a deep learning project or are familiar with this area, you have definitely encountered Keras. There are many options in it that will allow you to create deep learning models and provide an environment for us to train our data.

Keras was originally developed to allow researchers to conduct faster trials.

Indeed, Keras is working as fast as possible for data training and pre-processing. If you want to get to know Keras better, you can access their documentation via this link.

Prominent Advantages of Keras

🔹Allows you to perform operations on both the CPU and GPU.

🔹It contains predefined modules for convoluted and iterative networks.

Keras is a deep learning API written in Python that runs on the machine learning platform Theano and TensorFlow.

🔹Keras supports all versions starting with Python 2.7.

Keras, Tensorflow, Theano and CNTK

Keras is the library that offers structures that can realize high-level deep learning models. In this article, we will define the backend engines that we use in our projects many times. Below are these engines running in the background, we include the use of TensorFlow.

Keras Upload

Activation Function

🔹 We can apply the libraries we want to use by selecting them as shown below. There are 3 backend applications that we use. These are TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK) backend implementations.

Uploading Library

The platforms you see below are the platforms we encounter a lot in deep learning. As a footnote, I recommend GPU-based work when using TensorFlow. In terms of performance, you will find that with GPU usage, you will get faster and more performance results.

In summary, Keras works in harmony with these 3 libraries. In addition, it works by replacing the backend engine with these three libraries without making any changes to the code. Let’s take a closer look at TensorFlow, which we can use together with Keras.


➡️ Let’s provide a version check if Python and Pip are installed for the project you are going to work with.

Version Control

➡️ I continue to work for my Mask RCNN project, where I am actively working. You can also create any project or create a segmentation project like me. If you want to continue in the same project, you can access the list of required libraries by clicking on the link.

Collecting Requirements

If you want, you can also upload these libraries one by one. But I require it in terms of being fast.I’m uploading it as a requirements.txt file.

➡️ Let’s go back to Keras and TensorFlow without surprising our goal. We can meet in another article for my Mask RCNN project. Now let’s make a quick introduction to TensorFlow. Let’s import both our project and print the version we use.


➡️ As you can see as the output, I am using version 2.3.1 of TensorFlow. As I said, You can use it based on CPU or GPU.

Output Version

➡️ Tensorflow as follows when pre-processing the data. We can continue our operations by including the keras.preprocessing module. It seems passive because I am not actively running the method now, but when we write the method that we will use, its color will be activated automatically.

Tensorflow Preprocessing

➡️As an example, we can perform pre-processing with TensorfFlow as follows. We divide our data set into training and testing, and we know that with the validation_split variable, 20% is divided into test data.

In this way, we have made a fast start to Keras and TensorFlow with you. I hope to see you in my next post. Stay healthy ✨


  1. https://keras.io/about/.
  2. Wikipedia, The free encyclopedia, https://en.wikipedia.org/wiki/Keras.
  3. https://keras.rstudio.com/articles/backend.html.
  4. Francois Chollet, Deep Learning with Python, Publishing Buzdagi.
  5. https://www.tensorflow.org.
  6. https://www.tensorflow.org/tutorials/keras/text_classification.



Basic Information About Feature Selection

Artificial learning, deep learning and artificial intelligence, which we actively come across in all parts of our lives, is a situation where everyone is working on it, and the predictions are measured with the success score. In business processes, the subject of artificial learning has a critical importance. The data that is in your hands or collected by the company personally and comes to the Feature Engineering phase, is carefully examined from many issues and prepared for the final situation and taken to the person working as a Data Scientist. He can make inferences for the firm by making sense of the data. Thus, if the product or service developed is tested by offering it to the customer and meets the necessary success parameters, we can make the performance of the product sustainable. One of the most important steps here is the scalability of the product produced and the rapid adjustment of the adaptation phase to business processes. Another event is to obtain the significance levels of the features determined by correlation from the data set, to make this meaningful and to determine by the Feature Engineer before the modeling phase. We can think of Feature Engineers as an additional power that accelerates and facilitates the Data Scientist’s business process.



In the case of job search, we may encounter a ‘Feature Engineer’ announcement, which may appear frequently. We can obtain the critical information we learn from the data during the feature selection process during the data preparation phase. Feature selection methods are intended to reduce the number of input variables to those believed to be most useful for a model to predict the target feature. Feature selection processes provide great convenience to employees by reducing the workload as much as possible, if they are determined logically while involved in data pre-processing processes. I mentioned that there is a special business area for this. Feature Selection situations affect the success of the data in modeling and directly affect the success of the values ​​to be predicted. For this reason, the most important part of the events from the first data to the product stage is the right decision of the working person to choose the feature. If the progress is positive, the product will come to life in a short time. Making statistical inferences from the data is as important as determining which data is and how important it is through algorithms. Statistics science should play a role in data science processes in general.



There are also feature selection methods to be determined by statistical filter. We can give examples of scales that differ in their choice of features. Unfortunately, most people working in this field do not care enough about statistical significance. Among some people working on Data Science and Artificial Intelligence, writing code is seen as the basis of this work. I can give examples of categorical and numerical variables for the data set. In addition, these variables are divided into two within themselves. While the feature we see numerically is known as integer and float, variables we see categorically are; known as nominal, ordinal and boolean. You can find this basically in the image I put below. These variables are literally vital to feature selection. In line with the operations performed, these variables can be decided with a statistician during the evaluation phase, and the analysis of the selected features should be made on a solid basis. One of the most necessary features of those working in this field is their ability to interpret and analyze well. In this way, they can easily present the data they prepare in the form of products, with the basics matching the logic.



There is almost no exact method available. Feature selection for each data set is evaluated with a good analysis. Because the operations performed may vary for each feature. That is, while one data set contains too many integers or float values, another data set you are working on may be boolean. Therefore, there may be cases where feature selection methods differ for each data set. The important issue may be to adapt quickly, understand what the data set offers us and produce solutions accordingly. With this method, it is possible for the decisions taken during the transactions to continue in a healthier way. Categorical variables can be determined by methods such as the chi-square test, even this method is more powerful and the rate of efficiency can reach higher points. The choice of features throughout the product or service development stages is the most important step that contributes to the success criteria of a model.








Credit Scoring / Credit Analysis

There are certain start-ups that every company will invest in or help with financial development. As a result of certain analyzes, the investor company determines the company to invest and acquire. In this way, taking the development into account, the amount of contribution to be provided in direct proportion to the return is calculated in advance. This kind of analysis method has been developed in banks among their customers by data scientist . In short, credit scoring transactions are carried out between the bank and the customer in the loan application. The purpose of doing this is basically evaluated with tests to see if people actually pay or will be able to pay the loan they will receive. This is called credit scoring in machine learning. After the transactions, a positive or negative feedback is made to the person applying for the loan. There are many metrics that evaluate in this direction. As an example to these; There are many features that will be examined in more detail, such as the amount of wages people get, their career history, their previous loan status, and so on. As a result of their evaluation, 1 and 0 values ​​that will be formed give us positive or negative meaning.

Banks do extensive research on this subject, as in most subjects, and after analyzing the data they have, they put them into machine learning processes. As a result of these processes, the final model is prepared by performing a few optimization operations on the logic testing steps. Then these situations are accelerated and tested for people who apply for almost every loan. Values ​​0 and 1 are assigned as values. As a result of the transactions, the output of 0 does not suggest us to give credit to this person, and vice versa, when the output of 1 comes, it makes the customer segmentation process for us by saying “you can give credit to this person”.After the last step is completed thanks to the data science staff, the last step for us is to return this information to the required departments, finalize the applications of the individuals according to the results and return. The importance of analysis is critical for a bank. Because the smallest mistakes made can cause the loss of large amounts. For this reason, every credit scoring transaction should return to the bank positively.

Credit scoring transactions are of great importance for every bank. The amount of money out of the safe and the failure of the person to be loaned to fully fulfill its responsibility will cause major financial problems. Therefore, the data science team working at the back should be experts in this field and evaluate the measures according to every circumstance. In addition, people’s personal information should be analyzed thoroughly and a logical return to their application should be made. After arranging the data pre-processing steps and performing the operations on the necessary variables, the process is about getting a little more data ready. Another critical issue in credit scoring is the data pre-processing steps and the analysis steps to be taken afterwards. The Data Science team should do the engineering of variables themselves and analyze the effects of variables and their correlations correctly. After these processes, it will be inevitable that a logical result will occur. To minimize the margin of error, it is all about adjusting the data almost perfectly and evaluating the necessary parameters.

It is necessary to create the machine learning algorithm at the very beginning of the processes required to perform credit scoring and the variables should be checked once more before the model. Because the transactions are completely related to variables. Therefore, the effect of categorical or numerical variables on the model differs. Also, while setting up this model, it must be adjusted carefully. If the parameters we will use are specifically using the Python programming language, the parameters can be tested thanks to the GridSearchCV () method, and then the most suitable parameters are integrated into the model. Thus, it can proceed more successfully in credit scoring. This increases the level of service provided, so that people can meet their expectations and provide a personalized service to suit them. People with a high level of satisfaction develop their bond with the bank. Additionally, they feel more confident psychologically. The most basic feature of people is to feel belonging or connected somewhere. Providing this can increase the customer potential owned. If you want your own advertisement to be made, you can keep a good bond with your customers and increase their loyalty to you. One of the things that directly affects this is undoubtedly credit scoring.

References :

Featured Image

Data Labeling Tools For Machine Learning

The process of tagging data is a crucial step in any supervised machine learning projects. Tagging is the process of defining areas in an image and creating descriptions of which object belongs to these regions. By labeling the data, we prepare our data for ML projects and make them more readable. In most of the projects I’ve worked on, I’ve created sets in the dataset, I’ve done self-tagging, I’ve done my training with tagged images. In this article, I will introduce the data labeling tools that I encounter the most by sharing my experience in this field with you.
Labeling Image


Colabeler is a program that allows labeling in positioning and classification problems. Computer vision is a labeling program that is frequently used in the fields of natural language processing, artificial intelligence, and voice recognition [2]. The visual example that you see below shows the labeling of an image. The classes you see here are usually equivalent to the car class. In the tool section that you see on the left side, you can classify objects like curves, polygons, or rectangles. This selection may vary depending on the limits of the data you want to tag.
Labeling Colabeler
Then in the section that says ‘Label Info’, you type the name of the objects you want to tag yourself. After you finish all the tags, you save them by confirming them from the blue tick button. And so you can go to the next image with Next. Here we should note that every image we record is sorted to the left of this blue button. It is also possible to check the images you have recorded in this way. One of the things I like most about Colabeler is that it can also use artificial intelligence algorithms.
📌 I performed tagging via Colabeler in a project I worked on before, and it is a software with an incredibly easy interface.
📽 The video on Colabeler’s authorized websites describes how to make the labeling.
Localization of Bone Age
I gave a sample image of the project I worked on earlier above. Because this project is a localization project in the context of machine learning, labeling has been done by adhering to these features. Localization means isolating the subregion of the image where a feature is located. For example, trying to define bone regions for this project only means creating rectangles around bone regions in the image [3]. In this way, I have labeled the classes that are likely to be removed in the bone images as ROI zones. I then obtained these tags as Export XML/JSON provided by Colabeler. A lot of machine learning employees will like this part, it worked very well for me!

♻️ Export Of Labels

Exporting XML Output
At this stage, I have saved it as JSON output, because I will use JSON data, you can save your data in different formats. In the image I give below, you can see the places of the classes I created in the JSON output. In this way, your data was prepared in a labeled manner.
JSON Format


ImageJ is a Java-based image processing program developed at the National Institutes of Health and the Laboratory for Optical and Computational Instrumentation (LOCI, University of Wisconsin). ImageJ’s plugin architecture and built-in development environment have made it a popular platform for teaching image processing [3].

As I listed above, you can see a screenshot taken from ImageJ in Wikipedia. As can be seen, this software does not exist on an overly complex side. It is a tool that is used in many areas regardless of the profession. 📝The documentation provided as a user’s guide on authorized ImageJ websites describes how to perform labeling and how to use the software tool.
📌 I have also been to Fiji-ImageJ software tools for images that I had to tag in the machine learning project. I think its interface is much older than other labeling programs I’ve worked with. Of course, you can perform the operations that you want to do from a software point of view, but for me, the software also needs to saturate the user from a design point of view.
The image I gave above was a screenshot I took during the project I was working on on my personal computer. In order to be able to activate the data while working on the Matlab platform, it was necessary to update with priority. For this reason, after updating, I continued to identify the images. Below is the package that will be installed during the installation of the Matlab plugin for ImageJ users.
ImageJ Matlab

📍Matlab Image Labeler

The Image Labeler app provides an easy way to mark rectangular area of interest (ROI) tags, polyline ROI tags, Pixel ROI tags, and scene tags in a video or image sequence. For example, using this app will start by showing you [4]:

  • Manually tag a picture frame from an image collection
  • Automatically tagging between image frames using an automation algorithm
  • Export tagged location fact data

Image Toolbox Matlab
In the image you see above, we can perform segmentation using Matlab image Labeler software. More precisely, it is possible to make labeling by dividing the data into ROI regions. In addition, you can use previously existing algorithms, as well as test and run your own algorithm on data.
Selection ROI
In this image I received from Matlab’s authorized documentation, the label names of the bounding regions you selected are entered in the left menu. A label Color is assigned according to the class of the object. It is also quite possible that we create our labels in this way. In the next article, I will talk about other labeling tools. Hope to see you ✨

  1. https://medium.com/@abelling/comparison-of-different-labelling-tools-for-computer-vision-f3afd678da76.
  2. http://www.colabeler.com.
  3. From Wikipedia, The Free Encyclopedia, ImageJ, https://en.wikipedia.org/wiki/ImageJ.
  4. MathWorks, Get Started with the Image Labeler, https://www.mathworks.com/help/vision/ug/get-started-with-the-image-labeler.html.
  5. https://chatbotslife.com/how-to-organize-data-labeling-for-machine-learning-approaches-and-tools-5ede48aeb8e8.
  6. https://blog.cloudera.com/learning-with-limited-labeled-data/.

Bilim İnsanları, Robotların Ağrıyı Algılaması ve Kendi Kendine Onarmasına Yardımcı Olmak İçin “Mini Beyinler” Geliştiriyor

Nanyang Teknoloji Üniversitesi’nde (Singapur) çalışan bilim insaları, beyinden ilham alan bir yaklaşım kullanarak, robotların ağrıyı tanıması ve hasar gördüğünde kendi kendine kendini onarması için yapay zekaya (AI) sahip olmanın bir yolunu bulmanın üzerine çalışıyorlar. NTU tarafından üretilen robotlar yakın zamanda hayatımızda yerini alacak.

Sistemde, fiziksel bir kuvvetin uyguladığı anlamak, basınçtan kaynaklanan ‘ağrıyı’ işlemek ve yanıtlamak için yapay zeka destekli sensör kitleri bulunuyor. Robotun, insan müdahalesine gereksinimi olmadan, küçük bir ‘yaralandığında’ kendi hasarını tespit etmesine ve onarmasına da olanak sağlıyor ve hızlıca kendini tamir ediyor.

Designed by stories / Freepik

Günümüzde robotlar, yakın çevreleri hakkında bilgi üretmek için bir sensör ağı kullanıyor. Örneğin, bir felaket kurtarma robotu, enkaz altında hayatta kalanı bulmak için kamera ve mikrofon sensörlerini kullanır ve kişiyi, kollarındaki dokunma sensörlerinden kılavuzluk ederek dışarı çıkarır. Bir fabrikada montaj hattında çalışan bir endüstriyel fabrika robotu, robotun kolunu doğru konuma yönlendirmek için görüş kullanır ve nesnenin kaldırıldığında kayıp kaymadığını belirlemek için sensörlere dokunur. Yani günümüz sensörleri tipik olarak bilgiyi işlemiyor. Ancak öğrenmenin gerçekleştiği tek bir büyük, güçlü, merkezi işlem birimine gönderiyor. Bu durum yanıt sürelerinin gecikmesine neden olur. Aynı zamanda bakım ve onarım gerektirecek, uzun ve maliyetli olabilecek hasarları gündeme getiriyor.

NTU’lu bilim insanlarının yeni yaklaşımı, yapay zekayı, robotik cilde dağıtılmış ‘mini beyinler’ gibi davranan çok sayıda küçük, daha az güçlü işleme birimine bağlı sensör düğümleri ağına yerleştiriyor. Bilim insanlarının, bu, öğrenmenin yerel olarak gerçekleştiği ve robot için kablolama gereksinimlerinin ve yanıt süresinin geleneksel robotlara göre beş ila on kat azaldığı anlamına geliyor.

Designed by stories / Freepik

Bu projenin yardımcı yazarı Elektrik ve Elektronik Mühendisliği Fakültesi’nden Doç. Dr. Arindam Basu, “Robotların bir gün insanlarla birlikte çalışabilmesi için, bizimle güvenli bir şekilde etkileşime girmelerinin nasıl sağlanacağı bir endişe. Bu nedenle, Dünyanın dört bir yanındaki bilim adamları, robotlara bir farkındalık duygusu getirmenin, örneğin acıyı ‘hissedebilme’, buna tepki verebilme ve zorlu çalışma koşullarına dayanma gibi yollar buluyor. Bununla birlikte, gereken çok sayıda sensörü bir araya getirmenin karmaşıklığı ve bu tür bir sistemin sonuçta ortaya çıkan kırılganlığı, yaygın olarak benimsenmesi için büyük bir engeldir.

Çalışmanın ilk yazarı, aynı zamanda NTU Malzeme Bilimi ve Mühendisliği Okulu’nda Araştırma Görevlisi olan Rohit Abraham John, “Bu yeni cihazların kendi kendini iyileştirme özellikleri, robotik sistemin ne zaman kendini tekrar tekrar birleştirmesine yardımcı oluyor ‘dedi. Oda sıcaklığında bile bir kesik veya çizikle yaralanmış. Bu, biyolojik sistemimizin nasıl çalıştığını taklit eder, tıpkı bir kesikten sonra insan derisinin kendi kendine iyileşmesi gibi.

Designed by stories / Freepik

Nesneleri tanımak için ışıkla etkinleşen cihazları kullanmak gibi nöromorfik elektronikler üzerindeki önceki çalışmalarını temel alan ve üzerine çalışan NTU araştırma ekibi, şimdi daha büyük ölçekli uygulamalar için sistemlerini geliştirmek üzere endüstri ortakları ve hükümet araştırma laboratuvarlarıyla işbirliği yapmayı düşünüyor ve robotların ağrıyı algılaması ve kendi kendine kendini onarmasına yardımcı olmak için “Mini Beyinler” geliştiriyor. NTU tarafından üretilen robotlar hayatımızın bir parçası olacak.


  1. https://www.sciencedaily.com/releases/2020/10/201015101812.htm
  2. http://www.freepik.com
  3. https://globalaihub.com/cahit-arf-makineler-dusunebilir-mi/

Veri Bilimcinin Mutlaka Bilmesi Gereken Sql ve Pandas Kodları

Bundan 15-20 yıl önce yazılım dünyasında belli başlı yeteneklere sahip olmak, bir kaç program bilmek işinizi oldukça kolaylaştırıyor ve yetkinlik sağlıyordu. Bugün geldiğimiz noktada ise tek bir program ya da becerinin yanında bir çok alanda, farklı özellik sahibi olmak ve farklı program dilleri bilmek kişiyi ön plana çıkarmaktadır.

SQL (Structured Query Language) sorgu dili geçmişte ve günümüzde hala önemini korusa da, özellikle yapay zeka ve makine öğrenmesi gibi alanların yaygınlaşması sonucunda Python’da kullanılan Pandas kütüphanesi ön plana çıkmaya başlamıştır.

Bu yazımda temel veri keşfi analizinde kullanılan kodların SQL ve Pandas da nasıl yazıldığını inceleyeceğiz. Analiz boyunca buradaki  Airports datasını kullanacağız. Airports datasının csv dosyasını SQL’e ve Pandas’a yükledikten sonra analize başlayabilirsiniz.

1. Select, Where, Distinct Komutları

select * from airports airportsTüm tabloyu getirir.
select top(10) * from airportsairports.head(10)İlk 10 satırı getirir.
select id from airports where iso_country = ‘TR’
airports.id[airports.iso_country == ‘TR’]iso_country si TR olan id leri getirir.
select distinct iso_region from airportsairports.iso_region.unique()Uniq değerleri getirir.

2. Birden Fazla Koşulla Seçim

SQL de birden fazla koşul and ve or ile verilirken, Pandas‘ta bu & ve | şekilleri ile verilir.

select * from airports where iso_country = ‘TR’ and type = ‘closed’
airports[(airports.iso_region == ‘US-CA’) & (airports.type == ‘seaplane_base’)]Country’si TR olan ve type’ı closed olan değerleri getirir.
select ident, name, municipality from airports where iso_region = ‘US-CA’ and type = ‘large_airport’airports[(airports.iso_region == ‘US-CA’) & (airports.type == ‘large_airport’)][[‘ident’, ‘name’, ‘municipality’]]Region’u US-CA olan type’i large airport olan kolonları ident,name ve municipality olan değerleri getirir.

3. Order By Sıralama Komutu

SQL’de sıralama Order By komutu ile yapılırken, Pandas’ta ise sort_values ve ascending komutu kullanılmaktadır.

select * from airports where iso_country = ‘TR’ order by idairports[airports.iso_country == ‘TR’].sort_values(‘id’)iso_country’si TR olanları id’e göre küçükten büyüğe sıralar.
select * from airports where iso_country = ‘TR’ order by id descairports[airports.iso_country == ‘TR’].sort_values(‘id’, ascending=False)iso_country’si TR olanları id’e göre büyükten küçüğe sıralar.

4. In-not, In Komutu

SQL’de where komutuna birden fazla değeri koşul olarak koymak için kullanılır. Pandas’ta ise bu işi isin komutu gerçekleştirir. Dışındakileri getirirken ~ simgesi unutulmamalıdır.

select * from airports where type in (‘small_airport’, ‘closed’) airports[airports.type.isin([‘small_airport’, ‘closed’])]type’i small_airport ve closed olanları getirir.
select * from airports where type not in (‘small_airport’, ‘closed’)airports[~airports.type.isin([‘small_airport’, ‘closed’])]type’i small_airport ve closed dışındakileri getirir.

5. Group By, Count Komutları

SQL ve Pandas’da group by komutları ortaktır. Count komutu ise yerine göre değişebilir. SQL’deki count ile Pandas’daki size aynı anlamda kullanılabilir diyebiliriz.

select iso_country, type, count(*) from airports group by iso_country, type order by iso_country, type

airports.groupby([‘iso_country’, ‘type’]).size()iso_country’e göre gruplar ve iso_coutry ve type göre sıralar.
select iso_country, type, count() from airports group by iso_country, type order by iso_country, count() descairports.groupby([‘iso_country’, ‘type’]).size().to_frame(‘size’).reset_index().sort_values([‘iso_country’, ‘size’], ascending=[True, False])iso_country’e göre gruplar ve iso_coutry ve type göre tersten sıralar.

6. Having

Gruplanmış veri SQL’de having komutu ile filtreleyebiliriz. Pandas’ta bu işlevi filtre komutu yapmaktadır.

select type, count(*) from airports where iso_country = ‘US’ group by type having count(*) > 1000 order by count(*) descairports[airports.iso_country == ‘US’].groupby(‘type’).filter(lambda g: len(g) > 1000).groupby(‘type’).size().sort_values(ascending=False)iso_country’si US olanları type göre gruplar 1000’den büyük olanları getirir.

7. Union all ve Union

select name, municipality,ident from airports where ident = ‘LTBE’ union all select name, municipality,ident from airports where ident = ‘LTFM’pd.concat([airports[airports.ident == ‘KLAX’][[‘name’, ‘municipality’]], airports[airports.ident == ‘KLGB’][[‘name’, ‘municipality’]]])identleri ‘LTBE’ ve ‘LTFM’ olanları getirir.

8. Insert into

SQL’de yeni tablo oluşturmak için create table komutu kullanılırken, tabloya yeni kayıt eklemek için kullanılan insert into komutu kullanılır. Pandas’da ise pd.DataFrame ile yeni tablo oluşturulup, yeni kayıtlar direk içine yazılabilir ya da başka bir tablo ile concat komutu ile birleştirilebilir.

create table yenitablo (id integer, isim text);df1 = pd.DataFrame({‘id’: [1, 2], ‘isim’: [‘Mert’, ‘Furkan’]})Yeni tablo oluşturur.
insert into yenitablo values (1, ‘Mert’);df2 = pd.DataFrame({‘id’: [3], ‘name’: [‘Hermione Granger’]})Tabloya yeni kayıt ekler.
pd.concat([df1, df2]).reset_index(drop=True)
data = pd.DataFrame([[9, 7], [8, 5]], columns=list(‘PD’))

data2 = pd.DataFrame([[6, 4], [1, 2]], columns=list(‘PD’))

data.append(data2, ignore_index=True)

9. Delete

SQL’de silme işlemini ‘delete‘ komutu yerine getirirken, Pandas’ta ‘drop‘ komutu kullanılır.

delete from airports where type = ‘heliport’airports.drop(airports[airports.type == ‘heliport’].index)type’ı heliport olanları siler.

Bu yazımda temel olarak SQL ve Pandas komutlarının karşılaştırılmasına değindim. Gelecek yazılarımda Pandas kütüphanesi ile ilgili bir seri hazırlamayı düşünüyorum. Soru ve önerileriniz için bana linkedinden yazabilirsiniz.

* Bu yazıdan esinlenerek hazırlanmıştır.

What's the difference? Artificial Intelligence, Machine Learning and Deep Learning

Welcome to the world of artificial intelligence! By the end of this article you’ll fully understand top 3 concepts in technology: artificial intelligence, machine learning and deep learning. Even though most people use them interchangeably, they don’t have the same meanings. Let’s dig in deeper.