Developing Teachers.com
A web site for the developing language teacher

What have corpora ever done for us?
by Hugh Dellar
- 1

The use of computers to store and help analyse language has obviously revolutionised many aspects of language teaching, and corpora linguists have become an ever-increasing presence at IATEFL and other similar conferences. Obviously, much good has come from this. We have had a whole new generation of much-improved dictionaries, all of which contain better information about usage, collocation and frequency; superb new reference books such as the Longman Grammar of Spoken and Written English have been made possible, and, perhaps inadvertently, corpora linguistics has helped to launch the Lexical Approach and to thus help move language back into the centre of language teaching. Nevertheless, it seems to me that despite all these advances, corpora linguistics has also had several negative side-effects on the way teachers perceive their roles, and that they have actually enslaved us in ways which are not entirely healthy. I would like to move on to consider the ways in which I feel this has occurred.

a. The fallacy of frequency

Corpora linguists repeatedly promote their products with often highly-detailed reference to frequency counts and the idea that frequency is central has become a common one. However, should a Pre-Intermediate learner wish to be passed the salt over dinner, simply knowing the infrequent item 'Salt' will facilitate this in a way that knowing the far more frequent 'Could', 'you', 'pass', 'the' and 'please' would not. Generally, it's not the most common words which carry core meanings; rather, it's the far rarer items that do. Simply knowing the 800 most common words in the language makes you only able to say a lot about not very much. In the same way, failure to learn word which may well be low-frequency generally, but which are possibly much higher frequency within specific types of conversations condemns you to not being able to say very much about a lot!! Frequency tells us nothing more than what is frequent. It cannot tell us what's useful, what's necessary or even what's teachable.

There are deeper problems here to do with the way in which frequency is actually calculated. Corpora remains word-obsessed and the process of lemmatisation compounds this. Hence, an idiom like 'You're a dark horse' is entered not as a two-word idiom, but rather as one example of 'dark' and another of 'horse, thus defaulting on two fronts. Similarly, plural nouns are currently counted as other examples of singular ones, which is a rather major oversight. Is, for instance, the singular of 'Many Happy Returns' 'A Happy Return'? 'Meetings' is not simply the plural of 'meeting', and it collocates with different words. Finally, knowing that, say, 'get' is a very common word does little to help teachers know whether 'get on with it' is more frequent that' Let's get down to business'. Sadly, until corpora start sorting by chunk they will remain of limited relevance.

b. The fallibility of human endeavour

That corpora need to be approached cautiously and with one's intuition fully tuned is made apparent by a cursory glance at the word 'thaw' on several published CDs. Should one access the word, wishing to know whether snow melts or thaws, one would be surprised to learn that a far more frequent example of the word, and thus - if we follow the logic of corpora linguists - a more useful collocate for our students is actually John, as in John Thaw, the late, great British actor.

Similarly, I once saw a Jane Willis talk wherein she suggested that one of the most common three-word lexical items in the English language was 'Princess of Wales'. It was only when pushed during questioning that she actually admitted that the corpora she had taken this data from was based almost exclusively on a couple of radio phone-in programmes. In the same, way, the actual construction of corpora-based materials - dictionaries and the like - also inevitably involve a degree of hammering out by researchers, often by means of a vote or a fudge. Corpora are by necessity human constructs based on limited samples of data, are easily skewed by input and thus are best viewed sceptically.

c. The limitations of what corpora can offer

While spoken language, conversation, may well form the basis - even the majority - of many corpora, what corpora can't show us is what typical conversations look like. It's not possible, for instance, to access ten typical conversations had by people talking about what they did last night or to look at the 20 most common ways of answering the question "So what do you do for a living, then?". As such, if we want to present our students with models of the kinds of conversations they themselves might actually want to have, we are forced to fall back on our (actually ample) experience of such conversations in order to script them. However, I would argue that it is precisely because we have got such broad experience of such conversations that we do tend to know how they work and sound and look.

For teaching purposes. we need to be able to script conversations that aren't so culturally and spatially bound as to exclude students; we need to ensure the conversations students are exposed to still somehow facilitate intra-class bonding. Input needs to be proto-typical and to include items which are easy for us to systematise and for learners to appropriate and assimilate. Corpora cannot do this for us.

To page 2 of 2

To the print friendly version

To the article index

Back to the top


Tips & Newsletter Sign up —  Current Tip —  Past Tips 
Train with us Online Development Courses    Lesson Plan Index
 Phonology —  English-To-Go Lesson   Articles Books
 Links —  Contact — Advertising — Web Hosting — Front page


Copyright 2000-2014© Developing Teachers.com