|
What
have corpora ever done for us?
by Hugh Dellar
- 1
The
use of computers to store and help analyse language has obviously
revolutionised many aspects of language teaching, and corpora
linguists have become an ever-increasing presence at IATEFL
and other similar conferences. Obviously, much good has come
from this. We have had a whole new generation of much-improved
dictionaries, all of which contain better information about
usage, collocation and frequency; superb new reference books
such as the Longman Grammar of Spoken and Written English
have been made possible, and, perhaps inadvertently, corpora
linguistics has helped to launch the Lexical Approach and
to thus help move language back into the centre of language
teaching. Nevertheless, it seems to me that despite all these
advances, corpora linguistics has also had several negative
side-effects on the way teachers perceive their roles, and
that they have actually enslaved us in ways which are not
entirely healthy. I would like to move on to consider the
ways in which I feel this has occurred.
a.
The fallacy of frequency
Corpora linguists repeatedly promote their products with often
highly-detailed reference to frequency counts and the idea
that frequency is central has become a common one. However,
should a Pre-Intermediate learner wish to be passed the salt
over dinner, simply knowing the infrequent item 'Salt' will
facilitate this in a way that knowing the far more frequent
'Could', 'you', 'pass', 'the' and 'please' would not. Generally,
it's not the most common words which carry core meanings;
rather, it's the far rarer items that do. Simply knowing the
800 most common words in the language makes you only able
to say a lot about not very much. In the same way, failure
to learn word which may well be low-frequency generally, but
which are possibly much higher frequency within specific types
of conversations condemns you to not being able to say very
much about a lot!! Frequency tells us nothing more than what
is frequent. It cannot tell us what's useful, what's necessary
or even what's teachable.
There
are deeper problems here to do with the way in which frequency
is actually calculated. Corpora remains word-obsessed and
the process of lemmatisation compounds this. Hence, an idiom
like 'You're a dark horse' is entered not as a two-word idiom,
but rather as one example of 'dark' and another of 'horse,
thus defaulting on two fronts. Similarly, plural nouns are
currently counted as other examples of singular ones, which
is a rather major oversight. Is, for instance, the singular
of 'Many Happy Returns' 'A Happy Return'? 'Meetings' is not
simply the plural of 'meeting', and it collocates with different
words. Finally, knowing that, say, 'get' is a very common
word does little to help teachers know whether 'get on with
it' is more frequent that' Let's get down to business'. Sadly,
until corpora start sorting by chunk they will remain of limited
relevance.
b.
The fallibility of human endeavour
That
corpora need to be approached cautiously and with one's intuition
fully tuned is made apparent by a cursory glance at the word
'thaw' on several published CDs. Should one access the word,
wishing to know whether snow melts or thaws, one would be
surprised to learn that a far more frequent example of the
word, and thus - if we follow the logic of corpora linguists
- a more useful collocate for our students is actually John,
as in John Thaw, the late, great British actor.
Similarly,
I once saw a Jane Willis talk wherein she suggested that one
of the most common three-word lexical items in the English
language was 'Princess of Wales'. It was only when pushed
during questioning that she actually admitted that the corpora
she had taken this data from was based almost exclusively
on a couple of radio phone-in programmes. In the same, way,
the actual construction of corpora-based materials - dictionaries
and the like - also inevitably involve a degree of hammering
out by researchers, often by means of a vote or a fudge. Corpora
are by necessity human constructs based on limited samples
of data, are easily skewed by input and thus are best viewed
sceptically.
c.
The limitations of what corpora can offer
While
spoken language, conversation, may well form the basis - even
the majority - of many corpora, what corpora can't show us
is what typical conversations look like. It's not possible,
for instance, to access ten typical conversations had by people
talking about what they did last night or to look at the 20
most common ways of answering the question "So what do
you do for a living, then?". As such, if we want to present
our students with models of the kinds of conversations they
themselves might actually want to have, we are forced to fall
back on our (actually ample) experience of such conversations
in order to script them. However, I would argue that it is
precisely because we have got such broad experience of such
conversations that we do tend to know how they work and sound
and look.
For
teaching purposes. we need to be able to script conversations
that aren't so culturally and spatially bound as to exclude
students; we need to ensure the conversations students are
exposed to still somehow facilitate intra-class bonding. Input
needs to be proto-typical and to include items which are easy
for us to systematise and for learners to appropriate and
assimilate. Corpora cannot do this for us.
To
page 2 of 2
To the print
friendly version
To
the article index
|