What have corpora ever done for us? by Hugh Dellar

The use of computers to store and help analyse language has obviously revolutionised many aspects of language teaching, and corpora linguists have become an ever-increasing presence at IATEFL and other similar conferences. Obviously, much good has come from this. We have had a whole new generation of much-improved dictionaries, all of which contain better information about usage, collocation and frequency; superb new reference books such as the Longman Grammar of Spoken and Written English have been made possible, and, perhaps inadvertently, corpora linguistics has helped to launch the Lexical Approach and to thus help move language back into the centre of language teaching. Nevertheless, it seems to me that despite all these advances, corpora linguistics has also had several negative side-effects on the way teachers perceive their roles, and that they have actually enslaved us in ways which are not entirely healthy. I would like to move on to consider the ways in which I feel this has occurred.

a. The fallacy of frequency

Corpora linguists repeatedly promote their products with often highly-detailed reference to frequency counts and the idea that frequency is central has become a common one. However, should a Pre-Intermediate learner wish to be passed the salt over dinner, simply knowing the infrequent item 'Salt' will facilitate this in a way that knowing the far more frequent 'Could', 'you', 'pass', 'the' and 'please' would not. Generally, it's not the most common words which carry core meanings; rather, it's the far rarer items that do. Simply knowing the 800 most common words in the language makes you only able to say a lot about not very much. In the same way, failure to learn word which may well be low-frequency generally, but which are possibly much higher frequency within specific types of conversations condemns you to not being able to say very much about a lot!! Frequency tells us nothing more than what is frequent. It cannot tell us what's useful, what's necessary or even what's teachable.

There are deeper problems here to do with the way in which frequency is actually calculated. Corpora remains word-obsessed and the process of lemmatisation compounds this. Hence, an idiom like 'You're a dark horse' is entered not as a two-word idiom, but rather as one example of 'dark' and another of 'horse, thus defaulting on two fronts. Similarly, plural nouns are currently counted as other examples of singular ones, which is a rather major oversight. Is, for instance, the singular of 'Many Happy Returns' 'A Happy Return'? 'Meetings' is not simply the plural of 'meeting', and it collocates with different words. Finally, knowing that, say, 'get' is a very common word does little to help teachers know whether 'get on with it' is more frequent that' Let's get down to business'. Sadly, until corpora start sorting by chunk they will remain of limited relevance.

b. The fallibility of human endeavour

That corpora need to be approached cautiously and with one's intuition fully tuned is made apparent by a cursory glance at the word 'thaw' on several published CDs. Should one access the word, wishing to know whether snow melts or thaws, one would be surprised to learn that a far more frequent example of the word, and thus - if we follow the logic of corpora linguists - a more useful collocate for our students is actually John, as in John Thaw, the late, great British actor.

Similarly, I once saw a Jane Willis talk wherein she suggested that one of the most common three-word lexical items in the English language was 'Princess of Wales'. It was only when pushed during questioning that she actually admitted that the corpora she had taken this data from was based almost exclusively on a couple of radio phone-in programmes. In the same, way, the actual construction of corpora-based materials - dictionaries and the like - also inevitably involve a degree of hammering out by researchers, often by means of a vote or a fudge. Corpora are by necessity human constructs based on limited samples of data, are easily skewed by input and thus are best viewed sceptically.

c. The limitations of what corpora can offer.

While spoken language, conversation, may well form the basis - even the majority - of many corpora, what corpora can't show us is what typical conversations look like. It's not possible, for instance, to access ten typical conversations had by people talking about what they did last night or to look at the 20 most common ways of answering the question "So what do you do for a living, then?". As such, if we want to present our students with models of the kinds of conversations they themselves might actually want to have, we are forced to fall back on our (actually ample) experience of such conversations in order to script them. However, I would argue that it is precisely because we have got such broad experience of such conversations that we do tend to know how they work and sound and look.

For teaching purposes. we need to be able to script conversations that aren't so culturally and spatially bound as to exclude students; we need to ensure the conversations students are exposed to still somehow facilitate intra-class bonding. Input needs to be proto-typical and to include items which are easy for us to systematise and for learners to appropriate and assimilate. Corpora cannot do this for us.

d. Corpora and the non-native speaker teacher

It is often claimed - mainly by those who are employed to make, package and sell corpora - that corpora are an invaluable aid for the non-native speaker teacher. I would personally argue that the opposite is far too often true and that as they stand, corpora massively favour native speakers.

One understandable reaction many teachers, both native and non-native, have to the notion that they should teach more spoken English is the 'but I'd never say this or that bit of language" response when faced with a spoken text. Ironically, written texts never elicit a similar "But I'd never write that myself" response, and there are several reasons for this, I feel. There is possibly an assumption that writing is a more creative realm where anything goes; there's also the fact that the grammar and the lexis of the written language have already been codified and disseminated and are thus more familiar to teachers; thirdly, I think, there's the fact that we pin our identities on our speech - our idiolect, our regional, class-based, age-oriented, in-group, gender-based grasp of lexis and grammar - far more profoundly than we do on what we write. We are so aware of differences in the way we speak that we usually fail to notice the massive similarities. A good example of this is the fact that every EFL book which focuses on the UK / US divide fails to note that the vast majority of the language used in both countries is remarkably similar, and instead frets over the present perfect, sidewalks versus pavements and the correct pronunciation of aluminium. Yet for every "It can out of the blue" / "It came out of left-field' divergence, there must surely be ten other idioms we all have in common.

Given this, I personally feel it doesn't take much to persuade non-native speaker teachers to stick to the already familiar, tried-and-tested formula of written texts and comprehension questions and structural grammar. By spending so much time pointing out relatively obscure quirks and neologisms, such as the fact that 'like' is being increasingly used to report speech (as in "He was like 'Hi' so I was like 'Bye') , corpora linguists are inadvertently making spoken English more of a foreign language for non-native speaker teachers than is perhaps wise for people who claim to believe - as I do - that spoken English should become much more a part of General English than is currently the case. Too relentless a focus on the new, the odd, the interesting, the different obscures the wealth of English that unites us all.

I also feel that it is not only many non-native speaker teachers who would never use 'like' in this way, but also many native speakers too. The vast majority of language teachers do NOT need corpora to tell us that this is a relatively unuseful piece of lexis, so long as it remains still relatively unused. Indeed, my own rule of thumb would be that if YOU don't say it, don't TEACH it. English as a foreign language is NOT English as the corpora knows it. If you believe, as I do, that the kind of model conversations coursebooks provide for teaching purposes should be better modelled on the information provided by corpora than is currently the case, then I find it hard to see how you couldn't also support the idea that corpora specialists should concentrate more on insights which will be of direct use to coursebook writers and teachers alike. Indeed, given the problematic status of spoken language within the classroom at present, I'd go so far as to say assert that failure to do anything less serves to sabotage attempts to spread a methodology based on spoken language (and here, of course, I'm compelled to acknowledge my own interest in this area as a coursebook writer).

I find it particularly interesting to note that the constructors of corpora - or at least their backers - seem as yet very reluctant to work on a corpus of English as used by non-native speakers. Obviously, this would be in essence the same corpus, but with much left out. This is precisely the point : that which is left out by competent non-native speakers has no real place in most - and especially most pre-Advanced - teaching materials.

e. Animal Farm (or Beware of the oppressive tendencies of those who come claiming to liberate us!!)

It would be churlish to deny that corpora have provided us with some useful insights into such features of language as the fact that would is three times more common when talking about past habits than used to is, but at the same time it must also be added that the way in which corpora have been presented has all-too often intimidated us into pretending that we didn't already know much - if not most - of what they confirm. For example, Mike McCarthy, at IATEFL Brighton 2001 spent half an hour blinding us with the statistics that showed - entirely unsurprisingly - that 'take the mickey' is far more common than 'mickey-taker' or 'mickey-taking'. Surely any fluent speaker of the language could have guessed this (dubiously relevant) fact themselves, based on their own intuitions about the language.

The relentless emphasis on the finality of corporal truth no only denies the reality of the classroom practitioner who has to get in there each and every day and try to give their students information about the language being studied, but also refuses to acknowledge the fact that we all have heard and read millions and millions more words than any corpus will ever hold and thus have good hunches about words as a result. Sure, hunches about language can be wrong, but more often than not, they aren't. I personally really resent the notion that not only are corpora useful for showing us the errors of our ways, but also for confirming when we're right. The implication is that we are not right UNTIL we've checked! This way lies madness - and the deskilling of us all!!

f. Conclusions

Obviously, it is important that teachers do keep themselves up-to-date with corpora findings and adapt their understanding of the way language works accordingly. Here I totally agree with Ron Carter that one thing corpora has helped us become more aware of is the fact that grammar is much broader than sentence-based / tense-based grammar would seem to suggest. Words have their own micro-grammar and so lexis needs to continuously be grammaticalised in typical ways. Nevertheless, it is also vital that teachers are encouraged to believe that they can tap into and trust their own inner corpora.

If Carter and McCarthy can proclaim that the more students are encouraged and trained to notice, the more they actually will notice, then the same much surely be true for us as teachers. Indeed, the true sign of corpora-work well done is its own eventual redundancy. This really brings me to my final point - one of the great ironies of corpora is that they have actually unwittingly made teachers more intuitive, not less. What corpora have done is to place language back at the centre of classrooms and, as such, we all now have to think much more about how we actually use language.

To a degree, corpora and teachers exist in a parent-child relationship, and many teachers are now ready to leave home. Thanks Mum and Dad - you've done a great job, we may be back to visit every now and then, but we've basically already got the message!!

However, lest we forget, corpora are bank-rolled by major publishing houses and have endless spin-off publications derived from them in an effort to recoup much of this investment. As such, maybe I'm expecting too much by asking those in receipt of the publisher's pound to loose the reins on much of their power and place it back where it rightly belongs - back in the hands of the humble classroom practitioners!!!


Hugh Dellar teaches EFL to a wide variety of international students at the University of Westminster, London, where he is also a teacher trainer. He is also the co-author of the Upper-Intermeidate General English coursebook, INNOVATIONS, as well as the forthcoming Intermediate-level follow-up, both published by Thomson Learning. His main reseacrh interests revolve around the implications of new reseacrh into thje nature of language for teaching, teacher training and materials development. he previously taught in Indonesia and has given papers, workshops and teacher training courses all over the world.

