Researchers from sociology and statistics implement a scalable seeded topic model that extracts interpretable meaning structures in perhaps the largest text corpus ever analyzed in the social sciences. The authors use their methodology to measure shared understandings of immigration in the Swedish news media during 1945–2019. The semi-supervised text model is open to use for all. 

Illustration from research data

The evolution of media frames of immigration.

Sociologists are discussing the need for more formal ways to extract meaning from text. The semi-supervised seeded topic model allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models’ functionality to identify associations in text based on word co-occurrences. The method estimates a concept’s shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of shared interpretations of immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945–2019. We infer turning points that partition discourse into meaningful eras and locate Sweden’s era of multicultural ideals that could have coined its tolerant reputation abroad.

For researchers interested in the running of seeded topic models on very large text data, we developed an R package, available on GitHub:

GitHub

Read or download the article

Hurtado Bodell, M., Magnusson, M., & Keuschnigg, M. (2024). Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945–2019. Sociological Methods and Research.

SageJournals

More about computation text analysis

Organisation