


Note that as well as usage differences, lemmatisation may differ from corpus to corpus – for example splitting the prepositional use of "to" from the use as a particle. The table also includes frequencies from other corpora. As an example, "out" occurs in at least 560 phrasal verbs and appears in nearly 1700 multiword expressions. The sense count does not include the use of terms in phrasal verbs such as "put out" (as in "inconvenienced") and other multiword expressions such as the interjection "get out!", where the word "out" does not have an individual meaning. On average, each word in the list has 15.38 senses. For example, "out" can refer to an escape, a removal from play in baseball, or any of 36 other concepts. The number of distinct senses that are listed in Wiktionary is shown in the polysemy column. Different corpora may treat such difference differently. For example, "singer" may be a form of either "sing" or "singe". Also, a single spelling can represent more than one root word. For example, "I" may be a pronoun or a Roman numeral "to" may be a preposition or an infinitive marker "time" may be a noun or a verb. A part of speech is provided for most of the words, but part-of-speech categories vary between analyses, and not all possibilities are listed. Ī list of 100 words that occur most frequently in written English is given below, based on an analysis of the Oxford English Corpus (a collection of texts in the English language, comprising over 2 billion words).

These top 100 lemmas listed below account for 50% of all the words in the Oxford English Corpus. For example, the lexeme be (as in to be) comprises all its conjugations ( is, was, am, are, were, etc.), and contractions of those conjugations. Some lists of common words distinguish between word forms, while others rank all forms of a word as a single lexeme (the form of the word as it would appear in a dictionary). According to a study cited by Robert McCrum in The Story of English, all of the first hundred of the most common words in English are of Anglo-Saxon origin, except for "people", ultimately from Latin "populus", and "because", in part from Latin "causa". Their findings were similar, but not identical, to the findings of the OEC analysis.Īccording to The Reading Teacher's Book of Lists, the first 25 words in the OEC make up about one-third of all printed material in English, and the first 100 words make up about half of all written English. The researchers published their analysis of the Brown Corpus in 1967. Īnother English corpus that has been used to study word frequency is the Brown Corpus, which was compiled by researchers at Brown University in the 1960s. The OEC includes a wide variety of writing samples, such as literary works, novels, academic journals, newspapers, magazines, Hansard's Parliamentary Debates, blogs, chat logs, and emails. In total, the texts in the Oxford English Corpus contain more than 2 billion words. Perhaps the most comprehensive such analysis is one that was conducted against the Oxford English Corpus (OEC), a massive text corpus that is written in the English language. Studies that estimate and rank the most common words in English examine texts written in English.
