Investigating the diffusion of morphosyntactic innovations using social media

About the project

We are interested in the way in which changes in the grammar of languages ('innovations') spread out from a small number of speakers to a larger section of the population ('diffusion'). We use Twitter to gather large quantities of localisable data from many places and across large areas; this allows us to investigate language variation and change in fine geographical detail.

Using Twitter, we are collecting multiple datasets (corpora) of tweets in English and Welsh in Britain; in Norwegian, Swedish, Danish, Icelandic, and Faroese across the Nordic countries; and in Turkish in Turkey. The selection of these languages allows us to compare the effects of very different demographic and geographic scenarios on patterns of diffusion: Welsh as a minority language versus English and the Scandinavian languages as majority languages; the low population density and topographic variability in Norway versus the high population densities in large parts of England.

We are in the process of identifying language changes currently diffusing in these populations and investigating their distribution in these corpora; so far, this has included the spread of a new second-person pronoun chdi (you) in Welsh, the deletion of the present-tense auxiliary form of 'be' in Welsh (replacing dan/dyn/ryn/yn ni’n gweld with ni’n gweld for 'we see'), the alternation between different forms of the English dative construction (give it me/give me it/give it to me), and the deletion of the preposition to following English directional verbs (go pub/go cinema).

Our results will be demonstrated in action through web-apps that predict users' origins using their responses to questions about their language use, and they will be made available to the public via an online atlas-style website. These are currently under development – more information will follow very soon.