Hate Speech and AI: Issues in Detection

Hate speech is a form of expression which attacks someone mostly based on their race, gender, ethnicity and sexual orientation. The history of hate speech dates back long time ago; however, with the expansion of the internet and social media, it had its most accelerated form. Now, 41% of the American population have experienced a form of online harassment as Pew Research Center’s report suggests. Also, the high correlation between suicide rates and verbal harrasment in migrant groups shows the crucial importance of detecting and preventing the spread of hate speech. Additonally as an instance from recent years, after the mass murder that happened in Pittsburg synagoge it has seen that the murderer was posting hated messages to jews constantly before the incident.



Retrieved from: https://www.kqed.org/news/11702239/why-its-so-hard-to-scrub-hate-speech-off-social-media


Furthermore, the Pew Research Center’s report also suggests that 79% of the American population thinks that the detection of hate speech/online harassment is in the responsibility of online service providers. Hence, many online service providers are aware of the importance of the issue and have close relationships with AI engineers while solving it.

When it comes to the logic of hate speech detection, there are many complex points. Firstly, such complexity comes from the current AI technologies’ limitations on understanding the contexts of human language. For instance, current technologies fail to detect hate speech or give false positives when there are contextual differences. As such, researchers from Carnegie Mellon University suggested that the toxicity of the speech may differ with the race, gender and ethnic characteristics of the people. Hence, to increase the quality of the data and detection; it is important to identify the characteristics of the author while identifying the hate speech and its toxicity rate according to the researchers. Also, such identification can also reduce the current bias the algorithms have.

Retrieved from: https://www.pewresearch.org/internet/2017/07/11/online-harassment-2017/pi_2017-07-11_online-harassment_0-01/


However, current AI technologies have difficulties in detecting such characteristics. Firstly, it’s difficult to identify the demographics and characteristics of the authors’; since in most of the cases such information is not available on the internet. So, the process of distinguishing hate speech becomes harder. Secondly, even if the author clearly indicates such information; sometimes the detection process becomes more difficult due to the cultural insights of the given context. The dynamics of the countries or even the regions in countries is changeable and is really related to their culture and language. Such differences and ongoing changing factors are also crucial points for the outcomes of the processes; some outcomes may fail to detect or detect false positives due to non-statistical cultural differences.



Language is one of the most complicated and most significant functions of the humankind. There are many different ways and contexts of communicating with language which even neuroscientists could not fully map yet. However, with artificial intelligence scientists are also one step forward in describing the patterns and mechanisms of language. In such sense, the crucially important subject in the age of the internet, hate speech detection, also has an advantage since it is much easier to detect online harassment with machine learning algorithms. Nevertheless, there is no way for humans to get out of the detection cycle in today’s technology with the issues faced in detection processes. 







Effects of Activation Functions

In Deep Learning, activation functions play a key role and have a direct impact on success rates. The main reason for this is that before reaching success values ​​in the output layers in neural networks, we can reach the change of success value with the change of determined coefficients and weights thanks to the activation function. Generally, the structure of functions varies between linear and nonlinear. This can be found by trying on models for structures such as clustering and regression, or you can access them through the links I have left in the resources section. There are different formulas for each activation function, and we must carefully create the code strings to be installed for them. The formula for neuron operations consists of weights and bias value. Statistical information is one of the most important points in these processes. Even if code writing seems like a critical role, and not considered much, what really matters is knowing what you’re doing. Knowledge of Mathematics and Statistics is an important issue that cannot be ignored. Mathematics and statistics play an important role in all of the Data Science, Deep Learning and Machine Learning processes.



As you can see in the above formula, the additional error parameter known as beta is actually bias. One of the most important structures taught during statistics education is the bias structure. In the neural networks we use and process, bias is an extremely valuable issue and cannot be ignored. The selection of activation functions is very important for the result in the exit and inlet parts effectively on neural networks. The appearance of these functions that contribute to learning varies per input and along the parameters, coefficients. The image I present below includes the situation that is considered as the inputs and outputs of the activation functions. I will leave it as a link at the last part for those who want to access these codes. Success criteria vary within each activation function itself. Softmax is mostly used in the output layer because it makes more sense and success. There are two commonly known names. Examples of these are softmax and sigmoid. Most people embarking on a career journey in this field often hear about these two activation functions. Data scientists working on neural networks are experimenting with ReLU as an initial step.



Activation functions vary according to success parameters along the x and y axes. The main target success rate is to coincide with the peak as data increases along the y-axis. To achieve this, both the parameter values, the coefficient adjustment and the activation function selected during the operations are effective. Back propagation – forward propagation through the coefficients are re-determined and kept at the optimum level, which has an incredible place throughout neural networks. These operations are completely related to mathematics. You should have knowledge of derivatives and if you are working in this field, the important thing is not to write code, but to know exactly what you are doing. You can observe as I left it at the bottom, we are always doing derivative operations for backward rotation. There are neural networks built on a mathematical background. After these processes, we can observe the success of the activation functions by finding the most suitable one in the exit section. Thus, we can easily find the optimum function for the model and see its usage cases on a project basis. As a result of these situations, success situations vary.



In the last part, I will explain the activation functions and briefly talk about what they do. Step Function, Linear Function, Sigmoid Function, Hyperbolic Tangent Function, ReLU Function, Leaky ReLU Function, Swish, Softmax Function can be given as examples for activation functions.

Step Function: Makes binary classification with threshold value.

Linear Function: It produces several activation values ​​but its derivative is constant.

Sigmoid Function: It is a function known by almost everyone and gives output in the range of [0,1].

Hyperbolic Tangent Function: It is a nonlinear function that outputs in the range of [-1,1].

ReLU Function: It is essentially a nonlinear function. The property of the ReLU function is that it takes value 0 for negative inputs and positive values ​​forever. [0, + ∞)

Leaky ReLU Function: Leaky ReLU distinctive feature is that it has transitioned with axes close to 0 but touches on the origin to 0 a and keeps the lost gradients in ReLU with the negative region.

Swish Function: This function produces the product of the inputs and the sigmoid function as an output.

Softmax Function: This function, which is used for multiple classification problems, produces outputs between [0,1] that show the probability that each given input belongs to a subclass.


I wrote the parts that I made use of the images I took and the definitions in the references section. If you liked my article, I would appreciate it if you could give feedback.


References :








Featured Image for Keras

A Quick Start to Keras and TensorFlow

Keras is a deep learning library designed in the Python language. If you have worked on a deep learning project or are familiar with this area, you have definitely encountered Keras. There are many options in it that will allow you to create deep learning models and provide an environment for us to train our data.

Keras was originally developed to allow researchers to conduct faster trials.

Indeed, Keras is working as fast as possible for data training and pre-processing. If you want to get to know Keras better, you can access their documentation via this link.

Prominent Advantages of Keras

🔹Allows you to perform operations on both the CPU and GPU.

🔹It contains predefined modules for convoluted and iterative networks.

Keras is a deep learning API written in Python that runs on the machine learning platform Theano and TensorFlow.

🔹Keras supports all versions starting with Python 2.7.

Keras, Tensorflow, Theano and CNTK

Keras is the library that offers structures that can realize high-level deep learning models. In this article, we will define the backend engines that we use in our projects many times. Below are these engines running in the background, we include the use of TensorFlow.

Keras Upload

Activation Function

🔹 We can apply the libraries we want to use by selecting them as shown below. There are 3 backend applications that we use. These are TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK) backend implementations.

Uploading Library

The platforms you see below are the platforms we encounter a lot in deep learning. As a footnote, I recommend GPU-based work when using TensorFlow. In terms of performance, you will find that with GPU usage, you will get faster and more performance results.

In summary, Keras works in harmony with these 3 libraries. In addition, it works by replacing the backend engine with these three libraries without making any changes to the code. Let’s take a closer look at TensorFlow, which we can use together with Keras.


➡️ Let’s provide a version check if Python and Pip are installed for the project you are going to work with.

Version Control

➡️ I continue to work for my Mask RCNN project, where I am actively working. You can also create any project or create a segmentation project like me. If you want to continue in the same project, you can access the list of required libraries by clicking on the link.

Collecting Requirements

If you want, you can also upload these libraries one by one. But I require it in terms of being fast.I’m uploading it as a requirements.txt file.

➡️ Let’s go back to Keras and TensorFlow without surprising our goal. We can meet in another article for my Mask RCNN project. Now let’s make a quick introduction to TensorFlow. Let’s import both our project and print the version we use.


➡️ As you can see as the output, I am using version 2.3.1 of TensorFlow. As I said, You can use it based on CPU or GPU.

Output Version

➡️ Tensorflow as follows when pre-processing the data. We can continue our operations by including the keras.preprocessing module. It seems passive because I am not actively running the method now, but when we write the method that we will use, its color will be activated automatically.

Tensorflow Preprocessing

➡️As an example, we can perform pre-processing with TensorfFlow as follows. We divide our data set into training and testing, and we know that with the validation_split variable, 20% is divided into test data.

In this way, we have made a fast start to Keras and TensorFlow with you. I hope to see you in my next post. Stay healthy ✨


  1. https://keras.io/about/.
  2. Wikipedia, The free encyclopedia, https://en.wikipedia.org/wiki/Keras.
  3. https://keras.rstudio.com/articles/backend.html.
  4. Francois Chollet, Deep Learning with Python, Publishing Buzdagi.
  5. https://www.tensorflow.org.
  6. https://www.tensorflow.org/tutorials/keras/text_classification.



Article Review: Multi-Category Classification with CNN

Classification of Multi-Category Images Using Deep Learning: A Convolutional Neural Network Model

In this article, the article ‘Classifying multi-category images using Deep Learning: A Convolutional Neural Network Model’ presented in India in 2017 by Ardhendu Bandhu, Sanjiban Sekhar Roy is being reviewed. An image classification model using a convolutional neural network is presented with TensorFlow. TensorFlow is a popular open-source library for machine learning and deep neural networks. A multi-category image dataset was considered for classification. Traditional back propagation neural network; has an input layer, hidden layer, and an output. A convolutional neural network has a convolutional layer and a maximum pooling layer. We train this proposed classifier to calculate the decision boundary of the image dataset. Real-world data is mostly untagged and unstructured. This unstructured data can be an image, audio, and text data. Useful information cannot be easily derived from neural networks that are shallow, meaning they are those with fewer hidden layers. A deep neural network-based CNN classifier is proposed, which has many hidden layers and can obtain meaningful information from images.

Keywords: Image, Classification, Convolutional Neural Network, TensorFlow, Deep Neural Network.

First of all, let’s examine what classification is so that we can understand the steps laid out in the project. Image Classification refers to the function of classifying images from a multi-class set of images. To classify an image dataset into multiple classes or categories, there must be a good understanding between the dataset and classes.

In this article;

  1. Convolutional Neural Network (CNN) based on deep learning is proposed to classify images.
  2. The proposed model achieves high accuracy after repeating 10,000 times within the dataset containing 20,000 images of dogs and cats, which takes about 300 minutes to train and validate the dataset.

In this project, a convolutional neural network consisting of a convolutional layer, RELU function, a pooling layer, and a fully connected layer is used. A convolutional neural network is an automatic choice when it comes to image recognition using deep learning.

Convolutional Neural Network

For classification purposes, it has the architecture as the convolutional network [INPUT-CONV-RELU-POOL-FC].

INPUT- Raw pixel values as images.

CONV- Contents output in the first cluster of neurons.

RELU- It applies the activation function.

POOL- Performs downsampling.

FC- Calculates the class score.

In this publication, a multi-level deep learning system for picture characterization is planned and implemented. Especially the proposed structure;

1) The picture shows how to find nearby neurons that are discriminatory and non-instructive for grouping problem.

2) Given these areas, it is shown how to view the level classifier.


A data set containing 20,000 dog and cat images from the Kaggle dataset was used. The Kaggle database has a total of 25000 images available. Images are divided into training and test sets. 12,000 images are entered in the training set and 8,000 images in the test set. Split dataset of the training set and test set helps cross-validation of data and provides a check over errors; Cross-validation checks whether the proposed classifier classifies cat or dog images correctly.

The following experimental setup is done on Spyder, a scientific Python development environment.

  1. First of all, Scipy, Numpy, and Tensorflow should be used as necessary.
  2. A start time, training path, and test track must be constant. Image height and image width were provided as 64 pixels. The image dataset containing 20,000 images is then loaded. Due to a large number of dimensions, it is resized and iterated. This period takes approximately 5-10 minutes.
  3. This data is fed by TensorFlow. In TensorFlow, all data is passed between operations in a calculation chart. Properties and labels must be in the form of a matrix for the tensors to easily interpret this situation.
  4. Tensorflow Prediction: To call data within the model, we start the session with an additional argument where the name of all placeholders with the corresponding data is placed. Because the data in TensorFlow is passed as a variable, it must be initialized before a graph can be run in a session. To update the value of a variable, we define an update function that can then run.
  5. After the variables are initialized, we print the initial value of the state variable and run the update process. After that, the rotation of the activation function, the choice of the activation function has a great influence on the behavior of the network. The activation function for a specific node is an input or output of the specific node provided a set of inputs.
  6. Next, we define the hyperparameters we will need to train our login features. In more complex neural networks, we encounter more hyperparameters. Some of our hyperparameters can be like the learning rate.
    Another hyperparameter is the number of iterations we train our data. The next hyperparameter is the batch size, which chooses the size of the set of images to send for classification at one time.
  7. Finally, after all that, we start the TensorFlow session, which makes TensorFlow work, because without starting a session a TensorFlow won’t work. After that, our model will start the training process.


🖇 As deep architecture, we used a convolutional neural network and also implemented the TensorFlow deep learning library. The experimental results below were done on Spyder, a scientific Python development environment. 20,000 images were used and the batch is fixed at 100 for you.

🖇 It is essential to examine the accuracy of the models in terms of test data rather than training data. To run the convolutional neural network using TensorFlow, the Windows 10 machine was used, the hardware was specified to have an 8GB of RAM with the CPU version of TensorFlow.

📌 As the number of iterations increases, training accuracy increases, but so does our training time. Table 1 shows the number line with the accuracy we got.

Number of iterations vs AccuracyThe graph has become almost constant after several thousand iterations. Different batch size values can lead to different results. We set a batch size value of 100 for images.

✨ In this article, a high accuracy rate was obtained in classifying images with the proposed method. The CNN neural network was implemented using TensorFlow. It was observed that the classifier performed well in terms of accuracy. However, a CPU based system was used. So the experiment took extra training time, if a GPU-based system was used, training time would be shortened. The CNN model can be applied in solving the complex image classification problem related to medical imaging and other fields.


  1. https://www.researchgate.net/figure/Artificial-neural-network-architecture-ANN-i-h-1-h-2-h-n-o_fig1_321259051.
  2. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504- 507.
  3. Yoshua Bengio, “Learning Deep Architectures for AI”, Dept. IRO, Universite de Montreal C.P. 6128, Montreal, Qc, H3C 3J7, Canada, Technical Report 1312.
  4. Yann LeCun, Yoshua Bengio & Geoffrey Hinton, “Deep learning “, NATURE |vol 521 | 28 May 2015
  5. Yicong Zhou and Yantao Wei, “Learning Hierarchical Spectral–Spatial Features for Hyperspectral Image Classification”, IEEE Transactions on cybernetics, Vol. 46, No.7, July 2016.


Hello everybody, in this blog i want to talk about one of the free and most used open source deep learning library called TensorFlow. So why do we call it as open source? Open source allows the user to view and edit the codes of the software and to inform the user about program development. So you can easily create models with tensorflow, access machine learning pipeline with TensorFlow Extended (TFX), and train and deploy models in JavaScript environments with TensorFlow.js. You can also create complex topologies with features such as Functional API and Model Subclassing API.

What is TensorFlow?

TensorFlow was developed by Google Brain team initially to conduct machine learning and deep neural networks research and in 2015 TensorFlow codes were made available to everyone.TensorFlow is a library used for numerical computation using data flow charts in mathematics and if the literal meaning of tensor is a geometric object in which multidimensional data can be symbolized.

As you see above, tensors are multidimensional arrays that allow you to represent only higher dimensional datas. In deep learning, we deal with high-dimensional data sets where dimensions refer to different properties found in the data set.

Usage examples of TensorFlow

1)TensorFlow can be used efficiently in sound base applications with Artificial Neural Networks. These are; Voice recognition, Voice search, Emotion analysis and Flaw detection.

2) Further popular uses of TensorFlow are, text based applications such as sentimental analysis (CRM, Social Media), Threat Detection (Social Media, Government) and Fraud Detection (Insurance, Finance).As an example PayPal use TensorFlow for fraud detection.

3) It can also be used in Face Recognition, Image Search, Image Classification, Motion Detection, Machine Vision and Photo Clustering, Automotive, Aviation and Healthcare Industries.As an example Airbnb uses TensorFlow to categorize images and improve guest experience.

4) TensorFlow Time Series algorithms are used for analyzing time series data in order to extract meaningful statistics. As an example Naver automatically classifies shopping product categories with tensorflow

5) TensorFlow neural networks also work on video data. This is mainly used in Motion Detection, Real-Time Thread Detection in Gaming, Security, Airports and UX/UI fields.As an example Airbus uses tensorflow to extract information from satellite imagery and provide insights to customers.

Where can i learn TensorFlow?

You can join course “Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning” on Coursera and “Intro to TensorFlow for Deep Learning” on Udacity for free.Tutorials for beginners and experts are available on TensorFlow’s official site.   You can find Mnist data set and other “Hello World” examples that I also have applied before.

As a result, we talked about the meaning of the word tensorflow, what tensorflow is, the usage areas of tensorflow and how we can learn. As it can be understood from the blog, world-leading companies prefer tensorflow for many things such as image classification, voice recognition, disease detection. Step into this magical world without wasting time! Hope to see you in our next blog…











Featured Image

A Step-By-Step Journey To Artificial Intelligence

Machine learning (ML) is the study of computer algorithms that develop automatically through experience [1]. According to Wikipedia, machine learning involves computers discovering how to perform tasks without being explicitly programmed [2]. The first thing that comes to most of you when it comes to artificial intelligence is undoubtedly robots, as you can see in the visual. Today I have researched the relevant courses at the basics of machine learning and artificial intelligence level for you, and here I will list the DataCamp and Coursera courses that I’m most pleased with.

DataCamp Courses

💠 Image Processing with Keras in Python: During this course, CNN networks will be taught how to build, train, and evaluate. It will be taught how to develop learning abilities from data and how to interpret the results of training.
Click to go to the course 🔗
💠 Preprocessing for Machine Learning in Python:  You’ll learn how to standardize your data to be the right format for your model, create new features to make the most of the information in your dataset, and choose the best features to improve your model compliance.
Click to go to the course  🔗
💠 Advanced Deep Learning with Keras: It shows you how to solve various problems using the versatile Keras functional API by training a network that performs both classification and regression.
Click to go to the course 🔗
💠 Introduction to TensorFlow in Python: In this course, you will use TensorFlow 2.3 to develop, train, and make predictions with suggestion systems, image classification, and models that power significant advances in fintech. You will learn both high-level APIs that will allow you to design and train deep learning models in 15 lines of code, and low-level APIs that will allow you to go beyond ready-made routines.
Click to go to the course 🔗
💠 Introduction to Deep Learning with PyTorch: PyTorch is also one of the leading deep learning frameworks, both powerful and easy to use. In this lesson, you will use Pytorch to learn the basic concepts of neural networks before creating your first neural network to estimate numbers from the MNIST dataset. You will then learn about CNN and use it to build more powerful models that deliver more accurate results. You will evaluate the results and use different techniques to improve them.
Click to go to the course 🔗
💠 Supervised Learning with scikit-learn: 

  • Classification
  • Regression
    • Fine-tuning your model
    • Preprocessing and pipelines

Click to go to the course 🔗

💠 AI Fundamentals:

  • Introduction to AI
  • Supervised Learning
    • Unsupervised Learning
    • Deep Learning & Beyond

Click to go to the course 🔗

Coursera Courses

💠 Machine Learning: Classification, University of Washington: 

  • The solution of both binary and multi-class classification problems
  • Improving the performance of any model using Boosting
  • Method scaling with stochastic gradient rise
  • Use of missing data processing techniques
  • Model evaluation using precision-recall metrics

Click to go to the course 🔗

💠 AI For Everyone, deeplearning.ai:  

  • Realistic AI can’t be what it can be?
  • How to identify opportunities to apply artificial intelligence to problems in your own organization?
  • What is it like to create a machine learning and data science projects?
  • How does it work with an AI team and build an AI strategy in your company?
  • How to navigate ethical and social discussions about artificial intelligence?

Click to go to the course  🔗

💠 AI for Medical Diagnosis, deeplearning.ai: 

  • In Lesson 1, you will create convolutional neural network image classification and segmentation models to diagnose lung and brain disorders.
  • In Lesson 2, you will create risk models and survival predictors for heart disease using statistical methods and a random forest predictor to determine patient prognosis.
  • In Lesson 3, you will create a treatment effect predictor, apply model interpretation techniques, and use natural language processing to extract information from radiology reports.

Click to go to the course 🔗
As a priority step in learning artificial intelligence, I took Artificial Neural Networks and Pattern Recognition courses in my Master’s degree. I developed projects related to these areas and had the opportunity to present these projects. So I realized that I added more to myself when I passed on what I knew. In this article, I mentioned the DataCamp and Coursera courses that you should learn in summary. Before this, I strongly recommend that you also finish the Machine Learning Crash Course.


  1. Mitchell, Tom (1997). Machine Learning. New York: McGraw Hill. ISBN 0-07-042807-7. OCLC 36417892.
  2. From Wikipedia, The free encyclopedia, Machine learning, 19 November 2020.
  3. DataCamp, https://learn.datacamp.com.
  4. Coursera, https://www.coursera.org.
Featured Image

HTC (Hybrid Task Cascade) Network Architecture

As a result of my recent literature research for image segmentation, I have come across very different segmentation architectures. Before this article, I told you about the architecture of Mask R-CNN. Just like this mask R-CNN architecture, the Cascade Mask R-CNN structure has appeared in the literature. I will try to enlighten you about this with the information I have collected from the original academic documents and research I have read.

Cascade is a classic yet powerful architecture that improves performance in a variety of tasks. However, how to enter sample segmentation with steps remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN provides only limited gains. In exploring a more effective approach, it was found that the key to a successful instance segmentation level is to take full advantage of the mutual relationship between detection and partitioning.
Hybrid Task Cascade for Instance Segmentation proposes a new Hybrid Task Cascade (HTC) framework that differs in two important respects:

  1. Instead of cascading these two tasks separately, it connects them together for common multi-stage processing.
  2. It adopts a fully convoluted branch to provide spatial context, which can help distinguish the rigid foreground from the complex background.

The basic idea is to leverage spatial context to improve the flow of information and further improve accuracy by incorporating steps and multitasking at each stage. In particular, a cascading pipeline is designed for progressive purification. At each stage, both bounding box regression and mask prediction are combined in a multi-tasking person.

Innovations ✨

The main innovation of HTC’s architecture is a cascading framework that connects object detection and segmentation, providing better performance. The information flow is also changed through direct branches between the previous and subsequent mask determinants. Architecture also includes a fully convolutional branch that improves spatial context, which can improve performance by better distinguishing samples from scattered backgrounds.
2017 Winner

Hybrid Task Cascade: Sample Segmentation Framework
  • It combines bounding box regression and mask prediction instead of executing in parallel. 
  • It creates a direct way to strengthen the flow of information between mask branches by feeding the mask features from the previous stage to the existing one.
  • It aims to gain more contextual information by fusing it with box and mask branches by adding an additional branch of semantic segmentation. 
  • In general, these changes in the framework architecture effectively improve the flow of information not only between states but also between tasks.

A comparison of the HTC network’s sample determination approaches with the latest technology products in the COCO dataset in Table 1 can be seen. In addition, the Cascade Mask R-CNN described in Chapter 1 is considered a strong basis for the method used in the article. Compared to Mask R-CNN, the naive cascading baseline brings in 3.5% and 1.2% increases in terms of box AP and mask AP. It is noted that this baseline is higher than PANet, the most advanced method of sample segmentation. HTC is making consistent improvements on different backbones that prove its effectiveness. ResNet-50 provides gains of 1.5%, 1.3% and 1.1%, respectively, for ResNet-101 and ResNeXt-101.
📌 Note: Cascade Mask R-CNN extends Cascade R-CNN to instance segmentation by adding a mask header to the cascade [3].


The image below shows the results of this segmentation in the COCO dataset.
In the results section of the article, the advantages of the HTC model they created over other models are mentioned.

We recommend the hybrid task cascade (HTC), a new graded architecture for Instance Segmentation. It intertwines box and mask branches for common multi-stage processing and uses a semantic partitioning branch to provide spatial context. This framework gradually improves mask estimates and combines complementary features at each stage. The proposed method without bells and whistles achieves a 1.5% improvement over a strong cascade Mask R-CNN baseline in the MS COCO dataset. In particular, our overall system reaches 48.6 masks AP in the test-inquiry dataset and 49.0 mask AP in test-dev.

📌 Finally, in order to understand the changes of variables in the table, I leave you a table of MS COCO metrics as a note.


  1. Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Hybrid Task Cascade for Instance Segmentation, April 2019.
  2. Zhaowei Cai and Nuno Vasconcelos, Cascader-cnn:Delving into high quality object detection, In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  3. https://paperswithcode.com/method/cascade-mask-r-cnn.
  4. https://cocodataset.org/#home

SSD(Single Shot Multibox Detector) model from A to Z

In this article, we will learn the SSD MultiBox object detection technique from A to Z with all its descriptions. Because the SSD model works much faster than the RCNN or even Faster R-CNN architecture, it is sometimes used when it comes to object detection.
This model, introduced by Liu and his colleagues in 2016, detects an object using background information [2]. Single Shot Multibox Detector i.e. single shot multibox detection (SSD) with fast and easy modeling will be done. And what can be mentioned by one shot? As you can understand from the name, it offers us the ability to detect objects at once.

I’ve collated a lot of documents, videos to give you accurate information, and I’m starting to tell you the whole alphabet of the job. In RCNN networks, regions that are likely to be objects were primarily identified, and then these regions were classified with Fully Connected layers. Object detection is performed in 2 separate stages with the RCNN network, while SSD performs these operations in one step.
As a first step, let’s examine the SSD architecture closely. If the image sounds a little small, you can zoom in and see the contents and dimensions of the convolution layers.

An image is given as input to the architecture as usual. This image is then passed through convolutional neural networks. If you have noticed, the dimensions of convolutional neural networks are different. In this way, different feature maps are extracted in the model. This is a desirable situation. A certain amount of limiting rectangles is obtained using a 3×3 convolutional filter on property maps.
Because these created rectangles are on the activation map, they are extremely good at detecting objects of different sizes. In the first image I gave, an image of 300×300 was sent as input. If you notice, the image sizes have been reduced as you progress. In the most recent convolutional nerve model, the size was reduced to 1. Comparisons are made between the limits set during the training process and the estimates realized as a result of the test. A 50% method is used to find the best among these estimates. A result greater than 50% is selected. You can think of it as the situation that exists in logistical regression.
For example, the image dimensions are 10×10×512 in Conv8_2. It will have outputs (classes + 4) for each bounding box when the 3×3 convolutional operation is applied and using 4 bounding boxes. Thus, in Conv8_2, the output is 10×10×4×(C+4). Assume that there are 10 object classes for object detection and an additional background class. Thus output 10×10×4×(11+4)=6000 will be. Bounding boxes will reach the number 10×10×4 = 400. It ends the image it receives as input as a sizeable Tensor output. In a video I researched, I listened to a descriptive comment about this district election:

Instead of performing different operations for each region, we perform all forecasts on the CNN network at once.

4 bounding boxes are estimated in each cell in the area on the right side, while the image seen on the left in the image above is original [3]. In the grid structures seen here, there are bounding rectangles. In this way, an attempt is made to estimate the actual region in which the object is located.
In the documents I researched, I scratched with the example I gave above. I really wanted to share it with you, because it is an enormous resource for understanding SSD architecture. Look, if you’ve noticed, he’s assigned a percentage to objects that are likely to be in the visual. For example, he gave the car a 50% result. But he will win because the odds above 50% will be higher. So in this visual, the probability that it is a person and a bicycle is more likely than it is a car. I wish you understood the SSD structure. In my next article, I will show you how to code the SSD model.Hope you stay healthy ✨


  1. Face and Object Recognition with computer vision | R-CNN, SSD, GANs, Udemy.
  2. Dive to Deep Learning, 13.7. Single Shot Multibox Detection (SSD), https://d2l.ai/chapter_computer-vision/ssd.html.
  3. https://jonathan-hui.medium.com/ssd-object-detection-single-shot-multibox-detector-for-real-time-processing-9bd8deac0e06.
  4. https://towardsdatascience.com/review-ssd-single-shot-detector-object-detection-851a94607d11.
  5. https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab.
  6. Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection, https://www.groundai.com/project/single-shot-bidirectional-pyramid-networks-for-high-quality-object-detection/1.

Featured Image

Article Review – Tooth Detection with Mask RCNN

In this article, I will review the article ‘Tooth Detection and Segmentation with Mask R-CNN [1]’ published at the Second International Conference on Artificial Intelligence in Information and communication. This article describes the implementation of automatic tooth detection and segmentation on Mask RCNN’s dental images. The article, it is aimed to identify only females and divide them into segments.

It should be noted that Mask RCNN has a good segmentation effect even in complex and crowded dental structures ⚠️

If you are dealing in this area like me, the things we need to pay attention to first when reviewing an article will be keywords (keywords). The keywords in this article were selected as Mask R-CNN, Object Detection, Semantic Segmentation, and Tooth. We continue to do our research on these keywords.

A one-step network such as the Fully Convolutional Neural Network (FCN), You only Look Once (YOLO) and Single Shot multibox Detector (SSD) are 100-1000 times faster than the region-recommended algorithm [3], [4], [5].

Technical Approaches

❇️ Data Collection

Since there is no public data set, 100 images were collected from the hospital and the data set was trained. Of these images, 80 images are divided into educational data. The remaining 10 images are verification data, while the other 10 images are test data. Images of different distances and lighting and people of different sexes and ages were selected within the project. (Challenge for the network)

❇️ Tag Images Annotation

Labelme is an image tagging tool developed by MIT’s Computer Science and artificial intelligence laboratory (CSAIL) [6]. Provides tools for tagging object edges. When annotating images, multiple polygons will form around the teeth. An example of this utility can be seen in Figure 1. Saves corner coordinates in a JSON file for an image. Since it is a manual operation, there will be a small error when annotating images. However, it does not affect the overall evaluation of the model. Since there is only one category, the tooth part is labeled as 1. The rest that is considered a background is labeled as 0.

❇️ Deep Network Architecture Details


Mask RCNN Workflow

                                                           Mask R-CNN Architecture

You can see the Mask R-CNN architecture in the figure above. Mask R-CNN consists of several modules. Mask R-CNN, an extension of Faster-RCNN, includes a branch of convolution networks to perform the sample segmentation task. This branch is a standard convolutional neural network that serves as a feature extractor. In principle, this backbone network can be any network that extracts image features such as ResNet-50 or ResNet-101. In addition, to perform multi-scale detection, a feature pyramid network (FPN) is used in the backbone network. FPN improves the standard feature extraction pyramid by adding a second pyramid that takes the top-level features from the first pyramid and passes them to the lower layers. A deeper ResNet101 + FPN backbone was used in this project.
Step by Step Detection

                                                                   Mask R-CNN Working Structure

🔍 Details Of Architecture

A Roi align method for changing the ROI pool has been proposed. RoIAlign can maintain an approximate spatial position. RPN regression results are usually decimal and require integrals. The boxes obtained by RPN must be joined at the same maximum pooling size before entering the fully connected layer. During the project process, it was reported that the Integral was also needed, allowing RoIAlign to eliminate the integral process and protect the decimals. It is accurate for detection and segmentation. The classification combines the loss values of RoI regression and segmentation. Classification and ROI regression loss are no different from normal object detection networks. The mask loss branch is a convolutional neural network with ROI as the input and output is a small mask of size 28×28.

✅ Results

As the data will be trained at 50 epochs, 20 epochs of the data will be trained to start with, and 30 epochs will be trained to fine-tune all layers. The total loss value is 0.3093, consisting of bounding box loss, class loss, mask loss, and RPN loss. The total loss curve is shown in Figure 4. The final test result is also shown to be (a) the best result and (b) the worst.

                                                                         Total loss curve

The Pixel Accuracy (PA) method is the simplest and most effective method for evaluating results. The best result was 97.4% PA and the worst was 90.1%. Since there are a small number of prosthetic samples in the dental samples found in the project, the accuracy of prosthetic detection was low.
Final Test Sonuçları

              Final test results. (a) best individual result example, (b) worst individual result example 


  1. Guohua Zhu, Zewen Piao, Suk Chan Kim, Department of Electronics Engineering, Pusan National University, Tooth Detection and Segmentation with Mask R-CNN, ICAIIC 2020.
  2. https://github.com/fcsiba/DentAid.
  3. Shelhamer, E., Long, J., and Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (Apr. 2017), 640–651. 1, 2.
  4. Redmon, J., and Farhadi, A. Yolov3: An incremental improvement. arXiv (2018). 1
  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Ssd: Single shot multibox detector. To appear. 1
  6. B. Russell, A. Torralba, and W. T. Freeman, Labelme, The Open Annotation Tool MIT, Computer Science, and Artificial Intelligence Laboratory [Online]. Available: http://labelme.csail.mit.ed.
  7. Zhiming Cui, Changjian Li, Wenping Wang, The University of Hong Kong, ToothNet: Automatic Tooth Instance Segmentation and Identification from Cone Beam CT Images.

The Future of Environmental Sustainability: AI and Greenhouse Emissions

Climate change continues to be one of the most important issues that humankind faces today. One of the main factors that causes climate change is the greenhouse effect; simply such effect refers to increase on earth’s temperature with respect to emissions of gases like carbon dioxide, nitrous oxide, methane and ozone; more broadly greenhouse gases. Emission of such gases and increase in greenhouse effect is significantly correlated with human activities. However, AI based activities would create a difference in such processes and environmental sustainability studies suggest. AI use for environmental sustainability can lower worldwide greenhouse emissions by 4% at the end of 2030, PwC forecasts. Such percentage corresponds to 2.4 Gt, which is the combined annual greenhouse gas emission of Australia, Canada and Japan. Anticipation is that such quantities would lead many institutions to develop their sustainability models with help of AI. 
Considering AI’s ability to process data more efficiently than ever before, such ability can be used to analyze the data linked to the environmental issues the report suggests. Such analyzes would assist environmental sustainability by identifying patterns and making forecasts. As a current practice, IBM developed AI systems to process extensive data of weather models in order to make weather forecasts more reliable. It has been stated by the company that the system developed increased the accuracy rate by 30%. In terms of sustainability, such accuracy may lead large institutions to manage their energy amount and minimize greenhouse emissions. 
Moreover, AI can assist to reduce the greenhouse emissions with its practices on transportation. Autonomous vehicles can have a promising impact on such reduction, since the vehicles use less fossil fuels with fuel efficient systems. Furthermore, if AI based systems started to be used for calculating the efficient roads on car-sharing services, autonomous vehicles may change the passenger habits. With the benefits of such efficient road calculations, many passengers would prefer car-sharing services or public transportation rather than individual use of vehicles. Also, autonomous vehicles would have a reductive factor on traffic since such vehicles would be informed of each other. Such reduction on traffic and communicative systems may assist vehicles to be more efficient in terms of their energy use. Such shifts in the area of transportation may have a significant effect on environmental sustainability since the area has a remarkable emission ratio.
On the sectoral side, AI can also be used to manage companies’ emissions. Electric services company Xcel Energy’s practice with AI is an instance for such management. Prior to the practice; after producing the electricity with burning coal, the Xcel factory released the greenhouse gases like nitrous oxide into the atmosphere like the many other companies’ factories in the sector. However, in order to limit such emission; the company advanced its Texas factory’s smokestacks with artificial neural networks. Such advancement assisted the factory in terms of developing a more efficient system and most significantly, limiting the emissions. Such systems may reduce the nitrous oxide emissions 20% International Energy Agency forecasts. Therefore now, hundreds of other factories in addition to Xcel Energy are using such advanced systems for their environmental sustainability.
However; besides such significant developments, AI systems have carbon footprints too since training data requires a considerable amount of energy. Some sources even suggest that such quantities of energy can predominate AI’s benefits on energy efficiency. On the other hand, it is also suggested that as AI’s own energy efficiency is also being developed, such quantity could become a minor factor considering AI’s contributions to energy efficiency and limiting greenhouse emissions. 
AI’s such intersections with social and scientific issues are most likely to be the crucial points of society’s future. According to the research, “The Role of Artificial Intelligence in Achieving Sustainable Development Goals” AI can assist to resolve 95% of the issues that are directly related to environmental SDGs; which are climate action, life below water and life on land. Considering such effect, AI can be the tool that will be used for taking a step forward in environmental sustainability. 
Vinuesa, R., Azizpour, H., Leite, I. et al. The role of artificial intelligence in achieving the Sustainable Development Goals. Nat Commun 11, 233 (2020). https://doi.org/10.1038/s41467-019-14108-y

Artificial Intelligence In The Healthcare Industry

Artificial intelligence in healthcare is the use of complex algorithms or, in other words, artificial intelligence (AI) to mimic human science in the analysis, interpretation and understanding of complex medical and health data. Although I am a computer engineer, the applications I have implemented before often also touch health care.

Introduction to RNN and LSTM With Keras

🔮 We continue our journey of artificial intelligence with the Keras library, which is highly important in the field of deep learning. Artificial neural networks with one and more layers have many different neural network models. We’ ll examine the RNN neural network model and the LSTM neural network model created by solving a small problem in repetitive neural networks.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

🧷 RNN (Recurrent Neural Network) model has different structure compared to other artificial neural networks. One of the biggest differences is that it has a temporary memory. This memory has short memory. For this reason, they can not remember very far, they can remember the recent past. Whereas ANN is? No way. ANN doesn’t have a memory. It is possible to forget artificial neural networks, but duplicate neural networks are able to remember the recent past with the characteristic that arises from their memory. And what good is a neural network that can remember the past? It is able to make inferences and predictions about the future by using its knowledge of the past. Let’s even say we’re classifying an image. It is able to perform classification by matching the feature in the image in front of it according to a feature it has already learned. Therefore, it works more effectively than other algorithms 🌟 .

[gdlr_core_space height=”30px”]

The next image shows the structure of an RNN network. In these models, the output from the previous step is fed as input from the current step. The networks that remember the information are realized through the hidden layers, which we call The Hidden Layers. Thus, there is a repetition hierarchy.

If it is necessary to give a good example of this model, the translation processes that we see almost intensively in daily life, text mining, in short, the natural language processing field works with RNN logic. How? Imagine you are doing a translation, just as a machine is translating, when you need to predict the next word of the sentence according to each language’s own semantic structure, you will need the previous words. So, you’ll have to remember the previous words. Here the RNN is running to your aid. But the RNN will have its drawbacks, of course. This is why the structure of the LSTM has been revealed. First, let’s examine the memory structure of RNN nicely and then we’ll see the LSTM together.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

There are nodes in the hidden layers that you see in the image. These nodes have temporal loops, which we call the self-feeding temporal loop. In this way, they use their internal memory for short periods of time to activate the recall mechanism. So, what are we talking about this self-feeding thing? In these connections between nerves, neurons communicate among themselves. Even though it gives out an input information it receives, it feeds itself with that input information so that it does not forget. When he gives this information to the next neuron, it stays in his mind that information he has learned before and when the time comes, he remembers it and makes it available.

📝 Let’s consider word prediction, which involves a simple natural language processing. Let’ s take an RNN character level where the word “artificial” is. As the first nine letters {a, r, t, i, f, i, c, i, a} we give the set of letters to the neural network. The vocabulary here consists of only seven letters {a, r, t, i, f, c, l} for which the letter repeats are ignored. Our neural network is waiting for the last letter ” l ” is guessing the letter. In this network, after giving the letter l to the neural network, an iteration form with the previous letter a is applied for the incoming letter r, and this is performed in all synaptic neurons. In this way the desired prediction is performed with a repeating neural network.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

Let’s make another example to understand the structure of RNN. Based on the image above, let’s remember that the alphabet that your teacher taught you in the first grade of primary school is used in sentences to write and even read books in later processes. For example, even when you are reading this block right now, you remember the use of the letters in the alphabet that you hold in your memory and in a way, you realize the repetitive neural network architecture. In fact, this example is considered because RNN has short-term memory and LSTM architecture, which is long-term memory, would be more appropriate if given an example.

[gdlr_core_space height=”50px”]
Structure of RNN
[gdlr_core_space height=”30px”]
  • One to One:  It is a neural network model in which one input versus one output.
  • One to Many: There is more than one output in a neural network with one input. For example, suppose we have an image as input. The expected output is the action in this image.

Output : The Baby who Plays the Piano 🎹

By detecting the notes (outputs) in the piano, baby and music book in the image, he remembers and predicts the output of these features he has already learned.

[gdlr_core_space height=”60px”]
  • Many to One: A neural network with more than one input is expected to output one. As an example, entries can be given in more than one sentence or words within a sentence. As output, for example, we can determine the desired emotion to be given in this sentence.
[gdlr_core_space height=”30px”]
  • Many to Many: The neural network of more than one input is expected to produce more than one output. Translation programs can be given as the best example of this.

Each word in the sentence “recurring models tackle sequence … healthcare” is actually counted as an input in the neural network. English to Turkish translation process here, as shown using neural network structure, an output containing more than one output is given.

💡 Let’s do a little coding with you! You can download The New York City Airbnb Open Data for free from Kaggle, the data set from which the RNN neural network will be used. Let’s start with the data set, shall we?

[gdlr_core_space height=”30px”]

🗺️ After the necessary libraries were introduced, we were able to take the data from our data set with pandas, ignoring the warnings as well. Now we will print the first 5 data on the screen with the head ( ) command for control purposes.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

After receiving the data, we need to determine according to what we will estimate in this data set. For example, I will be interested in estimating the location according to the hotel prices. That’ s why I assigned price information to the training data 🏷️.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

At this stage, we perform MinMax normalization by scaling to 0-1 range as pre-processing.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

📎 We then create empty X and Y training sequences for the formation of samples and predictions. We use an average number of samples from the data you see in the chart above. For example, we specify 100 samples for X_train. 101 according to the sample data taken from 100. We assign the data to the y_train set for prediction.

[gdlr_core_space height=”30px”]

🧐 So, is this going to happen for just 100 data? No. We will always shift this operation one at a time on all the examples. You can review the scrolling process in the For loop.

By performing a resize, you can print the data in the X_train dataset with shape[0] and shape[1]. You can also print the generated y_train prediction set on the screen.

[gdlr_core_space height=”50px”]
Creation of the RNN Model

So far, we have seen the step of preparing data. Now it’s time to build the RNN neural network model we’re going to use. Sequential is a structure that contains all these architectures. The layer is a layer that we use in the architecture. Dropout is a method of regularization that helps us overcome the problem of Overfitting. This is the layer that we will use the RNN architecture 🌲.

Then we started the Sequential( ) module and declared that we would use RNN in the architecture.

🔎 Let’s examine the parameters that SimpleRNN takes in this project together.

  • units: Size of output area
  • activation: Activation function to be used
  • use_bias: Parameter that specifies whether to use bias
  • return_sequences: Parameter containing the value of whether the value in the output array the last output or the full array is returned
  • input_shape: Parameter that specifies the shape of the data to be used
[gdlr_core_space height=”30px”]


The input_shape parameter must be specified only on the first SimpleRNN layer. Because the subsequent layers will develop depending on the other.

[gdlr_core_space height=”30px”]

A total of 4 SimpleRNN layer structures have been created in the RNN architecture created above. Tangent has been used as an activation function. You can test it with other functions such as ReLU, Sigmoid instead. As I said before, the Dropout process is a method that prevents the neural network from performing memorization. It removes neurons and their synaptic bonds from the neural network with a value of 2%. After the layers are created, ‘Adam’ optimization method is selected, and the mean squared error method is selected as the error calculator. Later training was carried out with up to 100 epochs fit.Below is the examination of the first 5 epoch and loss values 🏋️.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

🧮 After the training process is performed, you can create a test folder, or you can predict using the test folder that is already created for the dataset. We talked about RNN structures, but as you may recall, it was a structure with short-term memory. That’s why they can’t remember the very distant past. When you continue to run 100 epochs in this training, you will notice that the result is not very good with the test process. In order to avoid this problem, The Long Short-Term Memory structure, called LSTM, which has both short term and long term memory, has been created. In this way, neural networks can remember both nearby information and information of the very distant past. LSTM is a customized structure of the RNN architecture.

[gdlr_core_space height=”30px”]

X: Scaling of information is the structures that decide whether the connections that come to it merge or not.

+ : The collection of information is the structures that decide whether the information from X should be collected or not.

σ: The Sigmoid layer returns values such as 0 or 1 as a result of the data coming into this layer. (the sigmoid activation function makes 1 Gold 0 over a given value)

tanh: The Tanh activation function is used for the slow learning (vanishing gradient) action resulting from the very small gradient value.

h (t-1): Output value from previous input

h (t) : Output value

C (t): Information to be reached in the next neural network

C (t-1): Information from a previous neural network

In this way, we also learned the structure of LSTM memory. After creating the LSTM architecture with Keras, you can perform the training and testing process as in the RNN codes above 🤸‍♀️.

[gdlr_core_space height=”50px”]
Creation of the LSTM Model

As follows, Let’s quickly create Keras’ models in the same way and move on to creating the LSTM architecture.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

Let’s start building our general structure with Sequential ( ). You can add the LSTM layer and Dense to compile just like in RNN.

[gdlr_core_space height=”30px”]

When the training and testing process is performed, you will see more successful results than RNN. RNN and LSTM architectures have also been learned in depth. I wish you plenty of coding on a good day 😊.

[gdlr_core_space height=”60px”]


  1. DATAI TEAM, Deep Learning and Python: Deep Learning Course from A to Z,
  2. Retrieved from Keras Recurrent Neural Network, https://keras.io/layers/recurrent/
  3. Retrieved from https://becominghuman.ai/a-noobs-guide-to-implementing-rnn-lstm-
  4. Retrieved from https://en.wikipedia.org/wiki/Long_short-term_memory adresinden
  5. Retrieved from https://medium.com/@ishakdolek/lstm-d2c281b92aac.
  6. Retrieved from https://nvestlabs.com/recurrent-layers-of-keras-8/.
  7. Retrieved from https://vinodsblog.com/2018/12/31/how-neural-network-algorithms-
  8. Retrieved from https://www.analyticsvidhya.com/blog/2017/12/introduction-to-
  9. Retrieved from http://www.crvoices.com/archives/2779.
  10. Gatech, CS7650 Spring, Introduction to Deep Learning.
  11. Retrieved from https://onedio.com/haber/agzinizin-acik-kalmasina-sebep-olacak-bilimin-
  12. Retrieved from https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data

Meet Benjamin: The First AI Screenplay Writer

The algorithm behind Benjamin is a deep learning language model ‘long short-term memory’ (LSTM) recurrent neural network (RNN). Goodwin describes it as “It’s a lot like a more sophisticated version of the auto-complete on your phone, … at each step, you predict the next word, letter, or space… .”