The following table lists some useful corpus resources for the history of English. Most are available either on a fileserver at Manchester or on the web, in some cases with their own search engine; see the notes below the table. Resources not actually designed for linguistic work but of interest nevertheless – newspapers, literary texts, manuscripts – are listed at the foot of the page. See also the PDE page for the ‘Brown family’ of corpora, which now have a historical dimension extending backwards into the 20th century.
Name | Period | Variety | Words | Tag/Parse | More information |
---|---|---|---|---|---|
Helsinki Corpus of Old English | pre-1150 | – | 413k | N/N | On server. Part of ICAME collection |
Helsinki Corpus of Middle English | 1150-1500 | – | 609k | N/N | On server. Part of ICAME collection |
Helsinki Corpus of Early Modern English | 1500-1710 | British | 551k | N/N | On server. Part of ICAME collection |
Helsinki Corpus of Older Scots | 1450-1700 | Scots | 834k | N/N | On server. Part of ICAME collection |
Corpus of Early English Correspondence sampler | 1418-1680 | – | 450k | N/N | On server. Part of ICAME collection |
Zurich English Newspaper Corpus | 1661-1791 | news | 1.6m | N/N | On server. Part of ICAME collection |
Lampeter Corpus of Early Modern English Tracts | 1641-1732 | prose | 1.1m | N/N | On server. Part of ICAME collection |
Toronto Corpus of Old English | pre-1150 | – | 3m | N/N | Web (protected). Details here |
York-Toronto-Helsinki Parsed Corpus of Old English Prose | pre-1150 | prose | 1.5m | Y/Y | On server. Part of Penn Historical Corpora series |
York-Helsinki Parsed Corpus of Old English Poetry | pre-1150 | poetry | 71k | Y/Y | On server. Part of Penn Historical Corpora series |
Penn-Helsinki Parsed Corpus of Middle English 2 | 1150-1500 | – | 1.2m | Y/Y | On server. Part of Penn Historical Corpora series |
Penn-Helsinki Parsed Corpus of Early Modern English | 1500-1710 | – | 1.74m | Y/Y | On server. Part of Penn Historical Corpora series |
Penn-Helsinki Parsed Corpus of Modern British English | 1700-1914 | British | 949k | Y/Y | On server. Part of Penn Historical Corpora series |
Parsed Corpus of Early English Correspondence | 1410-1681 | letters | 2.16m | Y/Y | On server. Part of Penn Historical Corpora series |
Glossarial Database of Middle English | ME | Chaucer | ? | Y/N | Web (open). Access here |
Middle English Grammar Corpus | 1350-1500 | – | 450k | N/N | On server. Details here |
Corpus of Middle English Prose and Verse | ME | – | ? | N/N | Web (open). Access here |
Middle English Medical Texts | 1375-1500 | medical | 495k | N/N | CD in library. Details here |
Innsbruck Corpus of Middle English Prose (sampler) | 1100-1500 | prose | 4m | N/N | Part of ICAMET |
Letter Corpus of ICAMET (sampler) | 1386-1698 | letters | ? | N/N | Part of ICAMET |
Corpus of Irish English | 1100s on | Irish English | ? | N/N | On server. Details here |
ARCHER: A Representative Corpus of Historical English Registers | 1600-1999 | British, American | 3.3m | soon | More on ARCHER |
Image to Text | 1764-1815 | letters | 122k | N/N | Details here |
Corpus of late 18C Prose | 1761-1790 | NW British, letters | 300k | N/N | Details here |
Corpus of late Modern English Prose | 1861-1919 |
British, letters |
100k | N/N | Details here |
CLMET 3.0: The Corpus of Late Modern English Texts | 1710-1920 | British | 34.4m | Y/N | On server. Details here |
COHA: The Corpus of Historical American English | 1810-2009 | American | 406m | Y/N | Web (need to register). Access here |
Old Bailey Corpus | 1674-1834 | British, trials | 134m | N/N | Web (protected). Details here |
19th Century U.S. Newspapers | 1800s | American, news | ? | N/N | Web (protected). Details/access here |
The Google News Archive search | ? | news | ? | N/N | Web (open). Details/access here |
Time Magazine | 1923-2006 | American, magazine | 100m | Y/N | Web (open). Access here |
The Salamanca Corpus (English dialects in literature) | 1500-1950 | British, literary | 12m | N/N | Web (open). Still growing. Access here |
Note
Most historical texts held at Manchester are stored together on one fileserver, accessible on campus or via the VPN.
- In Windows you can navigate using Windows Explorer or My Computer or within MonoConc Pro to
\\nask.man.ac.uk\share$\fs_shared_01\Hum1\ALC\LEL_corpora. - On a Mac, in the Finder, from the Go menu, choose Connect to Server. Enter the server address
smb://vdm02-g1.ds.man.ac.uk/fs_shared_01$/HUM1/ALC/LEL_corpora and then click Connect. - Authenticate with your university username and password.
- Subfolders containing the corpora are subdivided in a rough-and-ready way by period: OE, ME, eModE, lModE, PDE.
- Manuals (many of them from the 1999 ICAME CD) are stored in the folder corpora-manuals under English_corpora. Click on the link to see the ICAME manuals.
Notes on specific corpora
- ICAME Corpus Collection on CD-ROM
Mostly copied to fileserver April 2003. This contains the following historical collections, samples of which can also be accessed on-line, together with various concordance programs. (See also PDE corpora.)- Helsinki Corpus (OE, ME and eModE)
The manual is available online. - Helsinki Corpus of Older Scots
- Corpus of Early English Correspondence sampler (1418-1680)
- Zurich English Newspapers (1671-1791)
- The Lampeter Corpus of Early Modern English Tracts (1641-1732)
- Helsinki Corpus (OE, ME and eModE)
- Complete Toronto Corpus of Old English (OE)
A convenient local version, from ca 1994, is available through MonoConc Pro or other concordancers in the OE folder of English_corpora; it is not quite complete or up-to-date, however. We are subscribed to the current online version, available on campus or using the VPN, from Library databases under ‘Dictionary of Old English Corpus’. For a handy, hypertext index of texts and editions, click here. - The Penn-York-Helsinki series of parsed historical corpora
These corpora all use the CorpusSearch 2 software, a command-line search engine with a very specific syntax. Once mastered, however, it enables extremely well-focused grammatical searches to be made. The program can now also be accessed via a web-based interface.- The York-Toronto-Helsinki Parsed Corpus of Old English Prose
100,000 words of OE prose texts selected from the excerpts in the Helsinki Corpus. Acquired December 2003. - The York-Helsinki Parsed Corpus of Old English Poetry
71,000 words of OE poetic texts selected from the excerpts in the Helsinki Corpus. Acquired August 2009. - PPCME2: Penn-Helsinki Parsed Corpus of Middle English (2nd edition)
1.3 million words. Licence fee paid June 2002. Installed Feb 2003. NB: the version we have is the 2nd edition, release 3. - PPCEME: Penn-Helsinki Parsed Corpus of Early Modern English
1.8 million words. Licence fee paid December 2005, acquired March 2011. NB: the version we have is release 2, and on the web interface we currently only have the Penn 2 supplement files, not the whole corpus. - PPCMBE: Penn-Helsinki Parsed Corpus of Modern British English
950,000 words, 1700-1914. Acquired March 2011. NB: the version we have is the much smaller 1st edition. - PCEEC: The Parsed Corpus of Early English Correspondence
2.2 million words (c. 1410-1681). Acquired August 2009.
- The York-Toronto-Helsinki Parsed Corpus of Old English Prose
- ICAMET: Innsbruck Computer Archive of Machine-Readable English Texts
Full details here. We have the sampler version of the Prose Corpus 1100-1500, 108 works (in 131 files, about 4 million words) and the Letter Corpus 1386-1688 (containing 254 complete letters from different sources, arranged diachronically). DD is authorised to supply the texts and manual (in pdf form) to users from the Department on receipt of a signed copy of the ‘Declaration of fair academic use’ for sending to Professor Manfred Markus, who has kindly provided the CDs. - ARCHER: A Representative Corpus of Historical English Registers (1600-1999), version 3.2
Full details here. 3.3 million words of British and American English in a number of genres. Available for personal use only on signing of user agreement. For copyright reasons, the corpus can only be used at Manchester, Salford or Lancaster in the UK or at specific universities in the USA, Germany, Sweden, Spain and Finland.
Literary texts and collections
We have at least the following in-house (stored in the appropriate folder on the server) and can get others for you from the Oxford Text Archive and elsewhere:
- Letters of Jane Austen
- Austen novels
- Milton texts
- Paston Letters of the 15th Century
- Lollard Sermons
- Chaucer’s Boece and Treatise on the Astrolabe
- Layamon’s Brut
- etc.
There are hundreds of texts – of varying degrees of reliability and legitimacy – available over the internet. Try one of these sites (EEBO is British-based, Literature Online Anglo-American, the rest American-based):
- Middle English Compendium
Access to Corpus of ME Prose and Verse, ME Dictionary and bibliography. - Early English Books Online (EEBO)
Available via Library Databases, containing ‘most of the books printed in the English Language between 1453 and 1700 in full-text’. - Eighteenth Century Collections Online (ECCO)
‘Every significant English-language and foreign-language title printed in the United Kingdom and beyond in the period 1700-1800, and with multiple full-text search options across all 33 million pages’, available via Library Databases. - JISC Historic Books allows semantic search of EEBO, ECCO and 19th-century British Library books from a single interface.
- Literature Online (LION)
Available via Library Databases, containing such Chadwyck-Healey databases as the Bible in English, Early English Prose Fiction, English Drama, English Poetry (searchable by word or phrase). - Bibliomania
- Great Books Online (searchable by word or phrase)
- the Gutenberg Project
- Literary Resources on the Net (Jack Lynch)
- Voice of the Shuttle
Other resources
- In the bigynnyng: The Manchester Middle English Digital Library ‘The University of Manchester Library’s Middle English manuscripts are of paramount importance to key subject areas, including literature, history, theology, linguistics and art history.’ There are over forty Middle English manuscripts digitised at very high resolution. Undergraduate and postgraduate students have used them as the basis of dissertation projects under the supervision of Nuria Yáñez-Bouza and David Denison.
- Parker Library on the web
‘Corpus Christi College and the Stanford University Libraries welcome you to Parker on the Web – an interactive, web-based workspace designed to support use and study of the manuscripts in the historic Parker Library at Corpus Christi College, Cambridge.’ The extraordinarily rich medieval manuscript collection at CCCC.
This page last updated 10th January 2017.