Richard Zimmermann at LEL seminar

Posted on November 4, 2019 by



This week’s seminar talk (Tuesday, 5th November 2019) is by our new Swiss import, Richard Zimmermann. He will tell us about “Cool things to do with corpora: Experiments on word embedding and lexical dispersion”.

The talk will be in SamAlex A115 at 4.30pm. Below is a short teaser:

Recently developed corpus methods are inspiring incredibly diverse research in linguistics and related disciplines, ranging from automatic plotting travel reviews on maps to the development of question answering systems for medical diagnosis. In this talk, I will present two studies that make use of such new corpus techniques hoping that they can be instrumental in the generation of interesting insights.

My first case study makes use of the word embedding technique Word2Vec. Word embedding is the mapping of words onto high-dimensional vectors representing their lexical semantics based on co-occurring words in large amounts of running texts. The implementation Word2Vec has now evolved to a point where linguists can simply use it off-the-shelf for their research. I embed the vocabulary of the EEBO and ECCO corpora and use the result to study a semantic change. Specifically, I am interested in the semantic chain shift FOOD > MEAT > FLESH. I trace the time course of the development and attempt to determine if the change proceeded as a push or as a pull chain.

My second case study is work in progress on lexical dispersion, the degree to which words are distributed evenly or as clumps in a corpus. Improving our understanding of dispersion is important because good dispersion measures are needed to control for frequency effects in psycholinguistic experiments, to determine specialized vs. general vocabulary in second language teaching, and to improve representativeness. I review one measure called DP as an example, and then present my own idea for a new measure based on word growth curves. My goal is to develop a dispersion measure that is independent of corpus size and I will explore this objective with a number of short experiments.