What Is The Oxford English Corpus?

I should probably preface by saying you should really, really like words before reading the rest of today’s post.

But if you do, The Oxford English Corpus is a treasure trove of useful information about the life of words, as we are using them, now. The goal: “As the corpus continues to develop, it will be possible to trace language change over time: words becoming more or less common, features spreading from one region to another, and the emergence of new meanings.”

And to step back but for a moment:

What is a corpus?

“A corpus is a collection of texts of written (or spoken) language presented in electronic form. It provides the evidence of how language is used in real situations, from which lexicographers can write accurate and meaningful dictionary entries.”

The Oxford English Corpus is perhaps the largest project of its kind, to collect, research, and understand how 21st century English language evolves as it’s being used. Content for the Oxford English Corpus is collected online, ranging from: “literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of blogs, emails, and Internet message boards”, spanning an impressive 2 billion words to draw upon.

The OEC is organized into 20 main subject areas. I was a little surprised to see Fiction only constitutes 0.2% of the content. And that’s not even getting into what percentage 24.4% News content is probably fiction itself. But, we don’t have to get into that now. (click on the image for a larger view):

I highly recommend checking out the “Using the Corpus” page.

“Words don’t exist in isolation. They have strong attractions for other words, and form patterns and associations that are often regular and predictable, though not usually rigid or permanent.”

There’s good, and more importantly, amusing insight to be gained from how the corpus can be used to understand how word formation happens in actual usage. Along with assorted, amusing word discoveries the OEC dutifully reports (i.e., inner slut, bitchfest, suckfest), check out —

“What’s worth getting excited about (or not)?

The most common uses of -tastic are: craptastic, poptastic, funktastic, fabtastic, pimptastic, creeptastic,

blingtastic, ego-tastic, retrotastic, geektastic, and blogtastic.

Where don’t you want to find yourself?

The most common uses of -ville are: dumpsville, dullsville, squaresville, hicksville, smallville, stupidville, and shitsville.”

That’s just, excellent. Also, I noticed some interesting tendencies with something I’ve always wondered about — how two-word phrases eventually become one word phrases. The little chart below contrasts some tendencies between English and American usage of two-word vs. one-word usages (click on the image to view a larger image):


Surprise me


I run the ThinkLab at the University of Cambridge, and research digital habits, productivity, and wellbeing.

tyler shores cambridge

What I’m Reading Now:

Supercommunicators by Charles Duhigg

Related Articles

Have questions or ideas or requests for working together?

Get in touch