Old Saxon corpus released

Posted on January 10, 2016 by

Fans of Old Saxon were knocking back an extra glass of bubbly on New Year’s Eve at the news that the HeliPaD (Heliand Parsed Database), a parsed corpus of the language, had been released.

Old Saxon (also known as Old Low German) is a West Germanic language spoken in the area of what is now northern Germany before 1100 AD. This parsed version is based on the C text as presented in the Sievers (1878) edition, and contains just over 46,000 words. The tagging and parsing follows the guidelines of the Penn Corpora of Historical English.

You can access the corpus and documentation, which is released under a CC-BY 4.0 license, here. The corpus was constructed mostly by LEL’s George Walkden, with the help of Sheila Watts (Cambridge).

Featured image: the HeliPaD logo.