This page needs JavaScript! Please enable it to continue.

This website uses JavaScripts. If you use an adblocker, content may not be displayed or may not be displayed correctly.

The plumbing of corpus linguistics: A guided tour of the corpus-processing pipeline (Methodenworkshop)

DateWednesday, 23rd January 2019
LocationAlter Senatssaal, Wilhelmstraße 26

veranstalter: David Lukes
ansprechpartner: Christina Meuser, Dennis Dressel
institution: HPSL
language: Englisch
location institution: Freiburg
date_raw: 23.-25. Januar 2019
date_sort: 23.01.2019, 00:00:00

While it’s not necessary to know how corpus software works in order to use it, having a high-level idea of the entire process, from raw data to what happens when you type a query into a search interface, can help you become a power user. Providing you with such a general idea is the goal of this workshop. We’ll cover the following topics:

  • technical background: how text is represented inside a computer (file formats, plain text, character sets and encodings)
  • adding annotation: metadata (author, year of publication…), morphological tagging
  • corpus query systems: what’s their purpose (why not directly search the plain text files?), how they work behind the scenes, standard formats

The concepts will be illustrated with practical examples using the corpus query systems Corpus Workbench, (No)SketchEngine and ANNIS, and other related tools. By the end of the workshop, you should have a better intuition for what can and cannot be achieved using corpora, and you should also be better equipped to deal with the technical pitfalls of conducting corpus research.