It can be seen from the table that the two American English corpora (Brown and Frown) have the same numbers of samples for each of the 15 text categories while the British English corpora share the same proportions.The two groups differ in the numbers of samples for categories E, F, and G. There are important differences between the Kolhapur corpus and others in both sampling periods and the proportions of text categories.
For example, the Brown and LOB (the Lancaster-Oslo-Bergen corpus of British English, see Johansson/Leech/Goodluck 1978) can be used to compare American and British English as used in the early 1960s.
The updated versions of the two corpora, Frown (see Hundt/Sand/Skandera 1999) and FLOB (see Hundt/Sand/Siemund 1998) can be used to compare the two major varieties of English as used in the early 1990s.
Other corpora of the similar sampling period, such as ACE (the Australian Corpus of English, also known as the Macquarie corpus), WWC (the Wellington Corpus of Written New Zealand English) and Kolhapur (the Kolhapur Corpus of Indian English), together with FLOB and Frown, allow for comparison of . Frown on the one hand, and the Pre-LOB, LOB and FLOB corpora on the other hand, provide a reliable basis for tracking recent language change over 30-year periods.
The LCMC corpus (the Lancaster Corpus of Mandarin Chinese, see Mc Enery/Xiao/Mo 2003), when used in combination with FLOB/Frown corpora, provides a valuable resource for contrastive studies between Chinese and two major varieties of English.
In comparing these corpora synchronically, caution must be exercised to ensure that the sampling periods are similar.
For example, comparing the Brown corpus with FLOB would involve not only language varieties but also language change.Also, as the Brown model may have been modified slightly in some of these corpora, account must be taken of such variation in comparing these corpora across text category by normalizing the raw frequencies to a common basis.Table 11 compares the text categories and number of samples for each category in these corpora.The first modern corpus of English, the Brown University Standard Corpus of Present-day American English (i.e.the Brown corpus, see Kucěra/Francis 1967), was built in the early 1960s for written American English.The population from which samples for this pioneering corpus were drawn was written English text published in the United States in 1961 while its sampling frame was a list of the collection of books and periodicals in the Brown University Library and the Providence Athenaeum.