Sociologists are discussing the need for more formal ways to extract meaning from text. The semi-supervised seeded topic model allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models’ functionality to identify associations in text based on word co-occurrences. The method estimates a concept’s shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of shared interpretations of immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945–2019. We infer turning points that partition discourse into meaningful eras and locate Sweden’s era of multicultural ideals that could have coined its tolerant reputation abroad.
For researchers interested in the running of seeded topic models on very large text data, we developed an R package, available on GitHub: