Text Mining, 6 credits (732A92)

Text Mining, 6 hp

Course description

A large part of the so called big data explosion comes in the form of text. Some examples of textual data are short social media messages, movie recommendations, blog posts, medical journals and official documents in electronic form. The course aims to give an introduction to quantitative methods for analysing text, with a focus on prediction and decision making from textual data. Students will learn the major steps in analysing text: retrieving the text from the original source, pre-processing the text using linguistic rules and structure, and statistical modelling for inference and prediction.

Main field of study

Statistics

Level

Second cycle

Course type

Single subject and programme course

Examiner

Marco Kuhlmann

Course coordinator

Marco Kuhlmann

Director of studies or equivalent

Ann-Charlotte Hallberg

Available for exchange students

Yes
Course offered for Semester Weeks Language Campus VOF
Single subject course (Half-time, Day-time) Autumn 2018 v201844-201903 English Linköping
Single subject course (Half-time, Day-time) Autumn 2018 v201844-201903 English Linköping
F7MSG Statistics and Data Mining, Master´s Programme 3 (Autumn 2018) v201845-201851 English v

Main field of study

Statistics

Course level

Second cycle

Advancement level

A1X

Course offered for

  • Master´s Programme in Statistics and Data Mining

Entry requirements

A bachelor’s degree  in one of the following subjects: statistics, mathematics, applied mathematics, computer science, engineering, or equivalent. Completed courses in calculus, linear algebra, statistics and programming are required. 
Documented knowledge of English equivalent to Engelska B/Engelska 6.

Intended learning outcomes

After completion of the course the student should on an advanced level be able to: 
- use basic methods for information extraction and retrieval of textual data,
- apply text processing techniques to prepare documents for statistical modelling,
- apply relevant statistical models for analyzing textual data and correctly interpret the results,
- use statistical models for prediction of textual information,
- evaluate the performance of statistical models for textual data.

Course content

The course presents how textual data can be retrieved, linguistically pre-processed and subsequently analyzed quantitatively using formal statistical methods and models. The course brings together expertise from the areas of database methodology, computational linguistics and statistics.
The following topics are covered:
Introduction and overview of quantitative text analysis and its applications; Information extraction; Web crawling; Information retrieval; Tf-idf; Vector space models; Text preprocessing; Bag of words; N-grams; Sparsity and smoothing for text; Document classification; Sentiment analysis; Model evaluation; Topic models.

Teaching and working methods

The teaching comprises lectures, lab exercises and a text mining project. The lectures are devoted to presentations of concepts, and methods. The computer lab exercises are devoted to practical application of text mining tools. In the project work, the student will get hands-on experience in solving a text mining problem. Homework and independent study are a necessary complement to the course.
Language of instruction: English.

Examination

Written report on the Text mining project. Written reports on lab assignments. Detailed information about the examination can be found in the course’s study guide. 

Students failing an exam covering either the entire course or part of the course twice are entitled to have a new examiner appointed for the reexamination.

Students who have passed an examination may not retake it in order to improve their grades.

Grades

ECTS, EC

Other information

Planning and implementation of a course must take its starting point in the wording of the syllabus. The course evaluation included in each course must therefore take up the question how well the course agrees with the syllabus. 

The course is carried out in such a way that both men´s and women´s experience and knowledge is made visible and developed.

Department

Institutionen för datavetenskap
There is no course literature available for this course.

No examination details is to be found.

This tab contains public material from the course room in Lisam. The information published here is not legally binding, such material can be found under the other tabs on this page. There are no files available for this course.

Page responsible: Info Centre, infocenter@liu.se