Hate Speech and AI: Issues in Detection

Hate speech is a form of expression which attacks someone mostly based on their race, gender, ethnicity and sexual orientation. The history of hate speech dates back long time ago; however, with the expansion of the internet and social media, it had its most accelerated form. Now, 41% of the American population have experienced a form of online harassment as Pew Research Center’s report suggests. Also, the high correlation between suicide rates and verbal harrasment in migrant groups shows the crucial importance of detecting and preventing the spread of hate speech. Additonally as an instance from recent years, after the mass murder that happened in Pittsburg synagoge it has seen that the murderer was posting hated messages to jews constantly before the incident.



Retrieved from: https://www.kqed.org/news/11702239/why-its-so-hard-to-scrub-hate-speech-off-social-media


Furthermore, the Pew Research Center’s report also suggests that 79% of the American population thinks that the detection of hate speech/online harassment is in the responsibility of online service providers. Hence, many online service providers are aware of the importance of the issue and have close relationships with AI engineers while solving it.

When it comes to the logic of hate speech detection, there are many complex points. Firstly, such complexity comes from the current AI technologies’ limitations on understanding the contexts of human language. For instance, current technologies fail to detect hate speech or give false positives when there are contextual differences. As such, researchers from Carnegie Mellon University suggested that the toxicity of the speech may differ with the race, gender and ethnic characteristics of the people. Hence, to increase the quality of the data and detection; it is important to identify the characteristics of the author while identifying the hate speech and its toxicity rate according to the researchers. Also, such identification can also reduce the current bias the algorithms have.

Retrieved from: https://www.pewresearch.org/internet/2017/07/11/online-harassment-2017/pi_2017-07-11_online-harassment_0-01/


However, current AI technologies have difficulties in detecting such characteristics. Firstly, it’s difficult to identify the demographics and characteristics of the authors’; since in most of the cases such information is not available on the internet. So, the process of distinguishing hate speech becomes harder. Secondly, even if the author clearly indicates such information; sometimes the detection process becomes more difficult due to the cultural insights of the given context. The dynamics of the countries or even the regions in countries is changeable and is really related to their culture and language. Such differences and ongoing changing factors are also crucial points for the outcomes of the processes; some outcomes may fail to detect or detect false positives due to non-statistical cultural differences.



Language is one of the most complicated and most significant functions of the humankind. There are many different ways and contexts of communicating with language which even neuroscientists could not fully map yet. However, with artificial intelligence scientists are also one step forward in describing the patterns and mechanisms of language. In such sense, the crucially important subject in the age of the internet, hate speech detection, also has an advantage since it is much easier to detect online harassment with machine learning algorithms. Nevertheless, there is no way for humans to get out of the detection cycle in today’s technology with the issues faced in detection processes. 







Featured Image

Article Review – Tooth Detection with Mask RCNN

In this article, I will review the article ‘Tooth Detection and Segmentation with Mask R-CNN [1]’ published at the Second International Conference on Artificial Intelligence in Information and communication. This article describes the implementation of automatic tooth detection and segmentation on Mask RCNN’s dental images. The article, it is aimed to identify only females and divide them into segments.

It should be noted that Mask RCNN has a good segmentation effect even in complex and crowded dental structures ⚠️

If you are dealing in this area like me, the things we need to pay attention to first when reviewing an article will be keywords (keywords). The keywords in this article were selected as Mask R-CNN, Object Detection, Semantic Segmentation, and Tooth. We continue to do our research on these keywords.

A one-step network such as the Fully Convolutional Neural Network (FCN), You only Look Once (YOLO) and Single Shot multibox Detector (SSD) are 100-1000 times faster than the region-recommended algorithm [3], [4], [5].

Technical Approaches

❇️ Data Collection

Since there is no public data set, 100 images were collected from the hospital and the data set was trained. Of these images, 80 images are divided into educational data. The remaining 10 images are verification data, while the other 10 images are test data. Images of different distances and lighting and people of different sexes and ages were selected within the project. (Challenge for the network)

❇️ Tag Images Annotation

Labelme is an image tagging tool developed by MIT’s Computer Science and artificial intelligence laboratory (CSAIL) [6]. Provides tools for tagging object edges. When annotating images, multiple polygons will form around the teeth. An example of this utility can be seen in Figure 1. Saves corner coordinates in a JSON file for an image. Since it is a manual operation, there will be a small error when annotating images. However, it does not affect the overall evaluation of the model. Since there is only one category, the tooth part is labeled as 1. The rest that is considered a background is labeled as 0.

❇️ Deep Network Architecture Details


Mask RCNN Workflow

                                                           Mask R-CNN Architecture

You can see the Mask R-CNN architecture in the figure above. Mask R-CNN consists of several modules. Mask R-CNN, an extension of Faster-RCNN, includes a branch of convolution networks to perform the sample segmentation task. This branch is a standard convolutional neural network that serves as a feature extractor. In principle, this backbone network can be any network that extracts image features such as ResNet-50 or ResNet-101. In addition, to perform multi-scale detection, a feature pyramid network (FPN) is used in the backbone network. FPN improves the standard feature extraction pyramid by adding a second pyramid that takes the top-level features from the first pyramid and passes them to the lower layers. A deeper ResNet101 + FPN backbone was used in this project.
Step by Step Detection

                                                                   Mask R-CNN Working Structure

🔍 Details Of Architecture

A Roi align method for changing the ROI pool has been proposed. RoIAlign can maintain an approximate spatial position. RPN regression results are usually decimal and require integrals. The boxes obtained by RPN must be joined at the same maximum pooling size before entering the fully connected layer. During the project process, it was reported that the Integral was also needed, allowing RoIAlign to eliminate the integral process and protect the decimals. It is accurate for detection and segmentation. The classification combines the loss values of RoI regression and segmentation. Classification and ROI regression loss are no different from normal object detection networks. The mask loss branch is a convolutional neural network with ROI as the input and output is a small mask of size 28×28.

✅ Results

As the data will be trained at 50 epochs, 20 epochs of the data will be trained to start with, and 30 epochs will be trained to fine-tune all layers. The total loss value is 0.3093, consisting of bounding box loss, class loss, mask loss, and RPN loss. The total loss curve is shown in Figure 4. The final test result is also shown to be (a) the best result and (b) the worst.

                                                                         Total loss curve

The Pixel Accuracy (PA) method is the simplest and most effective method for evaluating results. The best result was 97.4% PA and the worst was 90.1%. Since there are a small number of prosthetic samples in the dental samples found in the project, the accuracy of prosthetic detection was low.
Final Test Sonuçları

              Final test results. (a) best individual result example, (b) worst individual result example 


  1. Guohua Zhu, Zewen Piao, Suk Chan Kim, Department of Electronics Engineering, Pusan National University, Tooth Detection and Segmentation with Mask R-CNN, ICAIIC 2020.
  2. https://github.com/fcsiba/DentAid.
  3. Shelhamer, E., Long, J., and Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (Apr. 2017), 640–651. 1, 2.
  4. Redmon, J., and Farhadi, A. Yolov3: An incremental improvement. arXiv (2018). 1
  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Ssd: Single shot multibox detector. To appear. 1
  6. B. Russell, A. Torralba, and W. T. Freeman, Labelme, The Open Annotation Tool MIT, Computer Science, and Artificial Intelligence Laboratory [Online]. Available: http://labelme.csail.mit.ed.
  7. Zhiming Cui, Changjian Li, Wenping Wang, The University of Hong Kong, ToothNet: Automatic Tooth Instance Segmentation and Identification from Cone Beam CT Images.