NEWS VERIFICATION PROJECT: Datasets for Share

We release several datasets that were used in the R&D of the automated detectors.

[this site is under construction]

Satirical Fake and Legitimate News Dataset (2016) will be extended and released in 2018 (shortly).

Fake and Legitimate Political News (2016-2017) will be released shortly.

Associated work: Asubiaro, Toluwase and Victoria L. Rubin (2018) "Comparing Features of Fabricated and Legitimate Political News in Digital Environments (2016-2017)" will be presented in Vancouver (see this short paper full text PDF or view the ASIS&T2018 Proceedings)

Abstract
With the problem of ‘fake news’ in the digital media, there are efforts at creation of awareness, automation of ‘fake news’ detection and news literacy. This research is descriptive as it pulls evidence from the content of online fabricated news for the features that distinguish fabrications from the legitimate political news around the time of the U.S. Presidential Elections (276 articles in total, from November 2016 - June 2017). Certain stylistic and psycho-linguistic features of fabrications may be apparent to the news readers: fewer words and paragraphs but longer paragraphs, more slangs, swear words and affective words in the stories. Such features could be used for educational information literacy campaigns for spotting so-called ‘fake news’. Other informative features may require specialized analytical tools (or further training) to notice the presence of more words, punctuation marks, demonstratives and emotiveness in fabrications but fewer verifiable facts (or named entities) in their headlines.

Native Ads and Editorials (2019) will be released shortly.

Associated work by Sarah Cornwell and Victoria Rubin (2019) "What Am I Reading?: Article-style Native Advertisements in Canadian Newspapers" will be presented at HICSS2019 in January.

Abstract
Native ads are ubiquitous in the North American digital news context. Their form, content and presentational style are practically indistinguishable from regular news editorials, and thus are often mistaken for informative content by newsreaders. This advertising practice is deceptive, in that it exploits loopholes in human digital literacy. Despite this, it is flourishing as a lucrative digital news advertising format.
This paper documents and compares the 2018 Canadian news editorial writing and advertising practices in an effort to highlight their similarities and differences for potential automatic detection and categorization. We collected 10 native ads and 10 editorial pieces from 4 Canadian newspapers. The 80 analyzed articles consisted of 40 native ads content-matched to editorials in the same newspaper. The individually-matched pairs and overall practices in the 2 groups were content-analyzed and compared. Native ads did not differ much from editorial articles in content but were likely to be surrounded by different types of advertising. In addition, advertisement labelling practices were inconsistent across national papers. We call for increased efforts in regulation and automatic detection of convert advertising by a more nuanced categorization and their more explicit labeling in the digital news.

The full manuscript is being finalized for camera-ready version of the HICSS2019 Proceedings.

Clickbait Dataset recombines 2 well-known datasets and will be released shortly with its associated manuscript (under preparation).