Fabricated news stories that contain false information but are presented as factually accurate (commonly known as ‘fake news’) have generated substantial interest and media attention following the 2016 U.S. presidential election. While the full details of what transpired during the election are still not known, it appears that multiple groups used social media to spread false information packaged in fabricated news articles that were presented as truthful. Some have argued that this campaign had a material impact on the election. Moreover, the 2016 U.S. presidential election is far from the only campaign where fake news had an apparent role. In this paper, work on a counter-fake-news research effort is presented. In the long term, this project is focused on building an indications and warnings systems for potentially deceptive false content.
As part of this project, a dataset of manually classified legitimate and deceptive news articles was curated. The key criteria for classifying legitimate and deceptive articles, identified by the manual classification project, are identified and discussed. The identified criteria can be embodied in a natural language processing system to perform illegitimate content detection. The criteria include the document’s source and origin, title, political perspective, and several key content characteristics. This paper presents and evaluates the efficacy of each of these characteristics and their suitability for legitimate versus illegitimate classification. The paper concludes by discussing the use of these characteristics as input to a customized naïve Bayesian probability classifier, the results of the use of this classifier and future work on its development.
Fabricated information is easily distributed throughout social media platforms and the internet. This allows incorrect and embellished information to misinform and manipulate the public in service of an attacker's goals. Falsified information – also commonly known as "fake news" – has been around for centuries. In modern day, it presents a unique challenge because of the difficulty of tracing news items origin, when spread electronically. Fake news can affect voting patterns, political careers, businesses’ new product launches, and countless other information consumption processes. This paper proposes a method that uses machine learning to identify “Fake News” stories. The conditional probability that a story is fake is calculated, given the presence of feature predictors inside a news story. A concise summary of the qualitative methods used to study Fake News stories is presented. This is followed by a discussion of computational social science and machine learning methods that can be used to train and tune a classifier to detect fake news. Some of the main linguistic trends, identified in social media platforms, that are associated with fake news are identified. A larger integrated system that can be used to identify and mitigate the impact of falsified content is also proposed.