Historical English

The following table lists some useful corpus resources for the history of English. Most are available either on a fileserver at Manchester or on the web, in some cases with their own search engine; see the notes below the table. Resources not actually designed for linguistic work but of interest nevertheless – newspapers, literary texts, manuscripts – are listed at the foot of the page. See also the PDE page for the ‘Brown family’ of corpora, which now have a historical dimension extending backwards into the 20th century. 

Name Period Variety Words Tag/Parse More information
Helsinki Corpus of Old English pre-1150 413k N/N On server. Part of ICAME collection
Helsinki Corpus of Middle English 1150-1500 609k N/N On server. Part of ICAME collection
Helsinki Corpus of Early Modern English 1500-1710 British 551k N/N On server. Part of ICAME collection
Helsinki Corpus of Older Scots 1450-1700 Scots 834k N/N On server. Part of ICAME collection
Corpus of Early English Correspondence sampler 1418-1680 450k N/N On server. Part of ICAME collection
Zurich English Newspaper Corpus 1661-1791 news 1.6m N/N On server. Part of ICAME collection
Lampeter Corpus of Early Modern English Tracts 1641-1732 prose 1.1m N/N On server. Part of ICAME collection
Toronto Corpus of Old English pre-1150 3m N/N Web (protected). Details here
York-Toronto-Helsinki Parsed Corpus of Old English Prose pre-1150 prose 1.5m Y/Y On server. Part of Penn Historical Corpora series
York-Helsinki Parsed Corpus of Old English Poetry pre-1150 poetry 71k Y/Y On server. Part of Penn Historical Corpora series
Penn-Helsinki Parsed Corpus of Middle English 2 1150-1500 1.2m Y/Y On server. Part of Penn Historical Corpora series
Penn-Helsinki Parsed Corpus of Early Modern English 1500-1710 1.74m Y/Y On server. Part of Penn Historical Corpora series
Penn-Helsinki Parsed Corpus of Modern British English 1700-1914 British 949k Y/Y On server. Part of Penn Historical Corpora series
Parsed Corpus of Early English Correspondence 1410-1681 letters 2.16m Y/Y On server. Part of Penn Historical Corpora series
Glossarial Database of Middle English ME Chaucer ? Y/N Web (open). Access here
Middle English Grammar Corpus 1350-1500 450k N/N On server. Details here
Corpus of Middle English Prose and Verse ME ? N/N Web (open). Access here
Middle English Medical Texts 1375-1500 medical 495k N/N CD in library. Details here
Innsbruck Corpus of Middle English Prose (sampler) 1100-1500 prose 4m N/N Part of ICAMET
Letter Corpus of ICAMET (sampler) 1386-1698 letters ? N/N Part of ICAMET
Corpus of Irish English 1100s on Irish English ? N/N On server. Details here
ARCHER: A Representative Corpus of Historical English Registers 1600-1999 British, American 3.3m soon More on ARCHER
Image to Text 1764-1815 letters 122k N/N Details here
Corpus of late 18C Prose 1761-1790 NW British, letters 300k N/N Details here
Corpus of late Modern English Prose 1861-1919

British, letters

100k N/N Details here
CLMET 3.0: The Corpus of Late Modern English Texts 1710-1920 British 34.4m Y/N On server. Details here
COHA: The Corpus of Historical American English 1810-2009 American 406m Y/N Web (need to register). Access here
Old Bailey Corpus 1674-1834 British, trials 134m N/N Web (protected). Details here
19th Century U.S. Newspapers 1800s American, news ? N/N Web (protected). Details/access here
The Google News Archive search ? news ? N/N Web (open). Details/access here
Time Magazine 1923-2006 American, magazine 100m Y/N Web (open). Access here
The Salamanca Corpus (English dialects in literature) 1500-1950 British, literary 12m N/N Web (open). Still growing. Access here

Note

Most historical texts held at Manchester are stored together on one fileserver, accessible on campus or via the VPN.

  • In Windows you can navigate using Windows Explorer or My Computer or within MonoConc Pro to
    \\nask.man.ac.uk\share$\fs_shared_01\Hum1\ALC\LEL_corpora.
  • On a Mac, in the Finder, from the Go menu, choose Connect to Server. Enter the server address
    smb://vdm02-g1.ds.man.ac.uk/fs_shared_01$/HUM1/ALC/LEL_corpora and then click Connect.
  • Authenticate with your university username and password.
  • Subfolders containing the corpora are subdivided in a rough-and-ready way by period: OE, ME, eModE, lModE, PDE.
  • Manuals (many of them from the 1999 ICAME CD) are stored in the folder corpora-manuals under English_corpora. Click on the link to see the ICAME manuals.

Notes on specific corpora

  • ICAME Corpus Collection on CD-ROM
    Mostly copied to fileserver April 2003. This contains the following historical collections, samples of which can also be accessed on-line, together with various concordance programs. (See also PDE corpora.)
    • Helsinki Corpus (OE, ME and eModE)
      The manual is available online.
    • Helsinki Corpus of Older Scots
    • Corpus of Early English Correspondence sampler (1418-1680)
    • Zurich English Newspapers (1671-1791)
    • The Lampeter Corpus of Early Modern English Tracts (1641-1732)
  • Complete Toronto Corpus of Old English (OE)
    A convenient local version, from ca 1994, is available through MonoConc Pro or other concordancers in the OE folder of English_corpora;  it is not quite complete or up-to-date, however. We are subscribed to the current online version, available on campus or using the VPN, from Library databases under ‘Dictionary of Old English Corpus’. For a handy, hypertext index of texts and editions, click here.
  • The Penn-York-Helsinki series of parsed historical corpora
    These corpora all use the CorpusSearch 2 software, a command-line search engine with a very specific syntax.  Once mastered, however, it enables extremely well-focused grammatical searches to be made. The program can now also be accessed via a web-based interface.
  • ICAMET: Innsbruck Computer Archive of Machine-Readable English Texts
    Full details here. We have the sampler version of the Prose Corpus 1100-1500, 108 works (in 131 files, about 4 million words) and the Letter Corpus 1386-1688 (containing 254 complete letters from different sources, arranged diachronically). DD is authorised to supply the texts and manual (in pdf form) to users from the Department on receipt of a signed copy of the ‘Declaration of fair academic use’ for sending to Professor Manfred Markus, who has kindly provided the CDs.
  • ARCHER: A Representative Corpus of Historical English Registers (1600-1999), version 3.2
    Full details here. 3.3 million words of British and American English in a number of genres. Available for personal use only on signing of user agreement. For copyright reasons, the corpus can only be used at Manchester, Salford or Lancaster in the UK or at specific universities in the USA, Germany, Sweden, Spain and Finland.

Literary texts and collections

We have at least the following in-house (stored in the appropriate folder on the server) and can get others for you from the Oxford Text Archive and elsewhere:

  • Letters of Jane Austen
  • Austen novels
  • Milton texts
  • Paston Letters of the 15th Century
  • Lollard Sermons
  • Chaucer’s Boece and Treatise on the Astrolabe
  • Layamon’s Brut
  • etc.

There are hundreds of texts – of varying degrees of reliability and legitimacy – available over the internet. Try one of these sites (EEBO is British-based, Literature Online Anglo-American, the rest American-based):

 Other resources

  • In the bigynnyng: The Manchester Middle English Digital Library ‘The University of Manchester Library’s Middle English manuscripts are of paramount importance to key subject areas, including literature, history, theology, linguistics and art history.’ There are over forty Middle English manuscripts digitised at very high resolution. Undergraduate and postgraduate students have used them as the basis of dissertation projects under the supervision of Nuria Yáñez-Bouza and David Denison.
  • Parker Library on the web
    ‘Corpus Christi College and the Stanford University Libraries welcome you to Parker on the Web – an interactive, web-based workspace designed to support use and study of the manuscripts in the historic Parker Library at Corpus Christi College, Cambridge.’ The extraordinarily rich medieval manuscript collection at CCCC.

 This page last updated 10th January 2017.

%d bloggers like this: