Date of Award
4-2021
Document Type
Thesis
Degree Name
Bachelor of Science
Department
Computer Science
First Advisor
Dr. Joonsuk Park
Abstract
Automated fact checking is a task in the domain of Natural Language Processing that deals with the verification of claims using evidence. Fact checking is becoming increasingly important as large amounts of human-generated information accumulate online. In the recent past, our society has witnessed large-scale spread of disinformation via the internet that has time and again led to noticeable disruptions in the fabric of society. Fact-checking would help mitigate the spread of disinformation by allowing large magnitudes of content to be automatically evaluated for disinformation. In this work, we construe and tackle multiple subtasks of fact checking using labeled data from WikiFactCheck-English (Sathe et al., 2020), a dataset of 124k triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k manually written claims that are refuted by the evidence documents. We provide sup port vector machine and logistic regression-based baselines, as well as attempt state-of-the-art results using large pre-trained transformer-based transfer learning approaches (specifically, BERT) that take our performance from a baseline accuracy of 68% to about 78%. Furthermore, we adapt a novel semi-supervised attention-based multiple-instance learning approach to learn item-level fact verification from document-level labeled data, leading to future possibilities in weakly supervised learning of fact-checking models. We also demonstrate that transfer learning from Natural Language Inference, a sentence-level inference task, leads to the best overall transfer performance in a low-resource data constrained setting, but no overall advantage given sufficient training data. We demonstrate that claims often require and benefit from more than 1 sentence to support them, and that BERT can learn to attend to multiple evidence sentences to make the correct fact checking inference.
Recommended Citation
Sathe, Aalok, "Fact-Checking of Claims from the English Wikipedia Using Evidence in the Wild" (2021). Honors Theses. 1550.
https://scholarship.richmond.edu/honors-theses/1550