Details of the study

An overview of project aims and methods

Growth in Grammar is a three-year project investigating the development of written language in school children in England. The study aims to provide a more thorough understanding of linguistic development in writing and to inform national and international curriculum policies on the teaching of English. It will also generate an updatable corpus of linguistically-annotated, educationally-authentic student writing, which will be made accessible to researchers and teachers.

We will address the following research questions:

  1. What combinations of linguistic features distinguish school students’ writing across:
    • different year groups?
    • different levels of writing attainment?
    • different text genres students are expected to write?
  2. How do these combinations compare with those found in adult writing?

To answer these questions, we will establish a new electronic collection – or corpus - of children’s writing, sampled from schools across England. To capture the range of variation found in children’s writing, the contents of the corpus will be balanced across year groups, levels of attainment, academic disciplines, text genres, geographic locations, student gender and student socio-economic status.

We will then use a combination of existing and custom-built software to identify and count linguistic features of potential interest (as identified by a systematic review of the research literature) in these texts. These counts will form the basis of a multi-dimensional analysis. Multi-dimensional analysis is a methodology, devised by Doug Biber (Biber 1988; Biber 2014; Conrad and Biber 2001), which aims to comprehensively but efficiently capture the ways in which texts of different types differ in their overall use of language. See here for more details about multi-dimensional analysis.

Our multi-dimensional analysis of the corpus will aim to:

  1. Identify a set of dimensions, which capture how students’ writings vary in their use of language. Each dimension will be made up of sets of linguistic features which our analysis shows are either frequently found together in texts or frequently ‘avoid’ each other. For example, previous research (Deane and Quinlan 2010) has suggested that texts with a high frequency of past tense verbs also tend to have large numbers of past perfect verbs and third person pronouns, but tend to have low frequencies of present tense verbs. This has been interpreted as a dimension indicative of narrative style: texts showing a prevalence of the former set of features have a highly narrative style; texts showing a prevalence of present tense verbs do not. The first stage of our analysis will identify the dimensions which best capture the range of linguistic variation in children’s writing.
  2. Use these dimensions to understand the differences between texts produced by children at different ages, at different levels of attainment, and in different genres. Specifically, we will establish where texts from across these variables tend to fall on each of the identified dimensions. This will enable us to understand the overall patterns of language use as it changes across the three key variables.
  3. Compare the dimensions of children’s writing with those previously established for adult writing.

On completion of the project, we will be making our corpus available for other researchers and teachers to use. We will also be creating an online interface which will enable users to explore the corpus for themselves. 


Biber, D. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.

Biber, D. 2014. 'Mult-Dimensional Analysis: A personal history'. In Berber Sardinha, Tony and Veirano Pinto, Marcia (eds). Mult-Dimensional Analysis: A personal history. Amsterdam: John Benjamins.

Conrad, S. and D. Biber (eds). 2001. Variation in English: Multi-Dimensional Studies. London: Longman.

Deane, P. and T. Quinlan. 2010. 'What automated analyses of corpora can tell us about students' writing skills'. Journal of Writing Research, 2/2: 151-77.