Date of Award

4-2021

Document Type

Thesis

Degree Name

Bachelor of Science

Department

Computer Science

First Advisor

Dr. Joonsuk Park

Abstract

Automated fact checking is a task in the domain of Natural Language Processing that deals with the verification of claims using evidence. Fact checking is becoming increasingly important as large amounts of human-generated information accumulate online. In the recent past, our society has witnessed large-scale spread of disinformation via the internet that has time and again led to noticeable disruptions in the fabric of society. Fact-checking would help mitigate the spread of disinformation by allowing large magnitudes of content to be automatically evaluated for disinformation. In this work, we construe and tackle multiple subtasks of fact checking using labeled data from WikiFactCheck-English (Sathe et al., 2020), a dataset of 124k triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k manually written claims that are refuted by the evidence documents. We provide sup port vector machine and logistic regression-based baselines, as well as attempt state-of-the-art results using large pre-trained transformer-based transfer learning approaches (specifically, BERT) that take our performance from a baseline accuracy of 68% to about 78%. Furthermore, we adapt a novel semi-supervised attention-based multiple-instance learning approach to learn item-level fact verification from document-level labeled data, leading to future possibilities in weakly supervised learning of fact-checking models. We also demonstrate that transfer learning from Natural Language Inference, a sentence-level inference task, leads to the best overall transfer performance in a low-resource data constrained setting, but no overall advantage given sufficient training data. We demonstrate that claims often require and benefit from more than 1 sentence to support them, and that BERT can learn to attend to multiple evidence sentences to make the correct fact checking inference.

Share

COinS