Many current Natural Language Processing (NLP) models are large neutral networks that are pre-trained on huge amounts of unlabeled data. How these models store, combine and use information from this self-supervised training is still largely obscure. I develop techniques that probe how linguistic information is structured within the model, and what the limitations of current models are.
Another research interest of mine are self-rationalizing models that generate free-text explanations along with their predictions. While textual explanations are flexible and easy to understand, they come with challenges such as a speculative relation to the prediction and the inheritance of possibly undesirable properties of human explanations, such as the ability to convincingly justify wrong predictions. I work on the evaluation and control of such explanations, and on the relation between explanation design and utility.
Another research interest of mine are self-rationalizing models that generate free-text explanations along with their predictions. While textual explanations are flexible and easy to understand, they come with challenges such as a speculative relation to the prediction and the inheritance of possibly undesirable properties of human explanations, such as the ability to convincingly justify wrong predictions. I work on the evaluation and control of such explanations, and on the relation between explanation design and utility.