Language technology as infrastructure
I participate in the national consortium Swe-Clarin where the goal is to develop methods and tools from language technology for application in other research areas. My focus is on tools and resources for the analysis of text. I have contributed to a new Swedish resource for the recognition of named entities in text and to different resources for the evaluation of machine translation.
An important aspect of this project is to work with researchers from other disciplines who use text data in their research. This helps me to discover potential and limitations in the tools we are using, and may give the researcher a new perspective on her data. Automatic methods and tools can be applied to much larger datasets than an individual researcher can read in a lifetime, and they can also be used to find structure and relations in texts that are not immediately visible.
I am also a contributor to the international initiative Universal Dependencies that develops treebanks for an increasing number of the world's languages. A treebank is a dataset of sentences with rich syntactic annotation that enables typological and comparative studies of the included languages, as well as the development of tools for automatic processing. The resource I develop is a parallel treebank of English original sentences with Swedish translations called LinES, that can be used for translation studies.
I supervise or examine a few theses each year at the Bachelor's and Master's levels.