Fact Verification with Stylometry: Fake News and AI

In the 21st century, fake news has become a worldwide phenomenon and major misleading factor on social decision-making mechanisms. Interpretation of false information obtained from such news was socially devastating and had many consequences like polarizing the society and affecting democratic order. 

The expansion of the internet and social media is the main factor that makes fake news to be more visible, now most of the fake news are written and read on the internet. The evolved and reformed fake news mislead masses through written or virtual forms which makes such news harder to distinguish from the real ones. Also, the quantity of misleading information is another factor that makes fake news harder to seperate. It’s been reported that 20% to 38% of the news on social media platforms have misleading information and false facts. Furthermore, developments in neural language models (LMs) also have a significant role in generating such fake news on a quick scale. 

Retrieved from: https://www.demdigest.org/artificial-intelligence-a-tool-in-fight-against-fake-news/

Since fake news started to have a strong correlation with developments in computer science fields, AI researchers started to work on new approaches to identify the misleading disinformation spreading throughout the internet. The researches’ computational approach to the stylometry method – which was originally used for identifying literary sources’ authenticity and author identities by statistically examining the style of the source and developed by Polish philosopher Wincenty Litoslawski in 1890 – had an important role in such progress. Stylometry was used with two main applications after such convergence: identifying the author and detecting the misinformation. By using the applications of computational stylometry, researchers from MIT, CSAIL examined the range of such method on identifying fake news.

Researchers’ datasets contained machine generated (LM) news on extensive variety and also, human psychology, which diverges from the truth as little as possible while lying, has been taken as a main basis while creating such datasets. Thus, on the first dataset; that was created for the extension scenario, LM was used to extend already existing news with relevant facts. It has been found that stylometry-based machines were highly successful in detecting the sentences that were fully generated by LM. However, machines failed to decide whether such sentences were misleading or not. Even more, they classified all LM generated sentences by different authors, both malicious and responsible, at the same category. Also, on zero shot settings, which are the settings where machines are not trained with before-hand classifications, stylometry-based machines labeled LM generated sentences as hand-written even if the whole text had just one or two hand-written sentences. At that point, it has been suggested that stylometry-based machines were less successful than the human participants of the research on detecting stylistic differences between fully LM generated or hand-written sentences. 

On the other hand, the second dataset; which was generated for the modification scenario, the facts on news articles were modified to be misleading with small alterations without changing the article’s style; in simple words they were made in the same structure as autocorrection tools, additionally changing some of the true facts to false. Such alterations made it harder to distinguish misleading information for both humans and stylometry-based machines. Additionally, as it was also mentioned before, what makes a false story more believable according to human psychology is for it to contain as true facts as possible; and such condition was the reason that made it difficult to verificate facts for humans and machines. However, stylometry-based machines had different difficulties in addition. Since the method suggests to verificate truth considering the stylistic differences made in text, stylometry-based machines could not distinguish facts in modification scenarios. Hence, it has been suggested by researchers that stylometry-based machines are not useful for cases when fake news are written with LM “autofalsifed” sentences with no stylistic differences too.

During such scenarios, researchers came up with the idea that comparing human participants’ results in terms of them, would be informative for improvement of stylometry-bases. At that experiment, a group of human participants completed their task using stylometry-based machines’ process of detection, while others were allowed to use external sources to verify their fact detection. In the end, it has been suggested that the first group had a score of 0.68 which is only slightly lower than stylometry-based machines’ scores; one the other hand the group who verified their process with external sources had higher scores, score of 0.84. At this point with the experiment on human participants, it has been clarified that using external sources can improve fact verification process strongly. However, such results channeled the researchers concerns on whether using stylistic cues on fact verification is reliable or not, again.

Considering such findings, researchers repeatedly faced limitations of stylometry methods’ limitations on fact verification. At this point, they have suggested that there is a certain need for non-stylometry based machines in order to take step forward on distinguishing fake news. Therefore, in such process; CSAIL researchers continue their research with new non-stylometry-based AI machines. The convergence of AI with such progress is promising as it can be seen from their research. AI’s potential for distinguishing fake news can be an up-and-coming advancement for today’s society’s future and for it to connect again after many divergences caused by virtual disinformation.

References:

Schuster, Tal & Schuster, Roei & Shah, Darsh & Barzilay, Regina. (2020). The Limitations of Stylometry for Detecting Machine-Generated Fake News. Computational Linguistics. 46. 1-18. 10.1162/COLI_a_00380.  Retrieved October 25, 2020, from https://arxiv.org/pdf/1908.09805.pdf

Mok, K. (2019). MIT’s New AI Tackles Loopholes in ‘Fake News’ Detection Tools. Retrieved October 25, 2020, from https://thenewstack.io/mits-new-ai-tackles-loopholes-in-fake-news-detection-tools/

Leave a Reply

Your email address will not be published. Required fields are marked *