Date of Award
Bachelor of Science
Dr. Douglas Szajda
Dr. Jon Park
The ﬁeld of voice processing has seen great advancements thanks in part to the rise of deep learning. However, the application of these deep learning techniques with an audio input space leads to an interesting result not commonly found when dealing with other input domains. Namely, common techniques for generating auditory adversarial samples using gradient-based optimization have been observed to have extremely low transferability among even the same model structure. This implies an inherent diﬀerence in the latent representations of audio samples that may be worth investigating in the pursuit of a more resilient and interpretable voice processing framework. Our core contribution is an investigation of the decision-making processes of modern voice processing implementations. Speciﬁcally, we are interested in explaining the impacts of audio input features on the alphabetic character outputs of a modern speech-to-text system such as DeepSpeech2. We investigate this with the aid of the Local Interpretable Model-agnostic Explanations (LIME) explanation technique as applied to an appropriate and contextually-aware representation of the problem space. For every alphabetic character, we select samples of audio that center on the value and use them as inputs for the voice processing system. The model predictions of these inputs are explained via LIME and the collection of all letter-use clusters are aggregated for analysis. With an understanding of the reasoning behind the classiﬁcation of characters, we will be able to better understand why attacks succeed or fail, develop novel new attacks, and better defend voice processing systems against adversarial attacks in general.
Kudlay, Vadim, "Understanding Model Reasoning in Automated Speech Systems: Implementing a Prototype Explanation System Using the LIME Method" (2021). Honors Theses. 1571.