Date of Award
Bachelor of Science
Dr. Joonsuk Park
Automated fact checking is a task in the domain of Natural Language Processing that deals with the veriﬁcation of claims using evidence. Fact checking is becoming increasingly important as large amounts of human-generated information accumulate online. In the recent past, our society has witnessed large-scale spread of disinformation via the internet that has time and again led to noticeable disruptions in the fabric of society. Fact-checking would help mitigate the spread of disinformation by allowing large magnitudes of content to be automatically evaluated for disinformation. In this work, we construe and tackle multiple subtasks of fact checking using labeled data from WikiFactCheck-English (Sathe et al., 2020), a dataset of 124k triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k manually written claims that are refuted by the evidence documents. We provide sup port vector machine and logistic regression-based baselines, as well as attempt state-of-the-art results using large pre-trained transformer-based transfer learning approaches (speciﬁcally, BERT) that take our performance from a baseline accuracy of 68% to about 78%. Furthermore, we adapt a novel semi-supervised attention-based multiple-instance learning approach to learn item-level fact veriﬁcation from document-level labeled data, leading to future possibilities in weakly supervised learning of fact-checking models. We also demonstrate that transfer learning from Natural Language Inference, a sentence-level inference task, leads to the best overall transfer performance in a low-resource data constrained setting, but no overall advantage given suﬃcient training data. We demonstrate that claims often require and beneﬁt from more than 1 sentence to support them, and that BERT can learn to attend to multiple evidence sentences to make the correct fact checking inference.
Sathe, Aalok, "Fact-Checking of Claims from the English Wikipedia Using Evidence in the Wild" (2021). Honors Theses. 1550.