Getting started – selecting data, wrangling data – early (basic) moves

This post and the few following will switch focus from the ML subject scores database to a database built out of normative data about word attributes, which actually comes in a number of different parts (downloadable at the links). Remember that these data are about the stimuli presented in a lexical decision test of visual word recognition and therefore include information about words and a matched set of nonwords.

We have .csv files holding: on the words and nonwords, variables common to both like word length;

2. data on word and nonword item coding, corresponding to item in program information, that will allow us to link these norms data to data collected during the lexical decision task;

3. data on just the words, e.g. age-of-acquisition, which obviously do not obtain for the nonwords.

These databases will be put together, manipulated and otherwise wrangled to achieve good understanding and an appropriate format for analysis of the word recognition behaviour recorded.

The normative data (e.g. frequency values etc. rather than item coding) were extracted from the English Lexicon Project (ELP, Balota et al., 2007) or from collections of ratings data reported and made available by Cortese and colleagues (Cortese & Fuggett, 2004; Cortese & Khanna, 2008) or by Kuperman and colleagues (Kuperman et al., in press). The repositories for the data can be found at the ELP website or at the locations specified in the cited papers. The normative data are available in the downloadable files here only to illustrate how to interact with such data in R.


Balota, D. A., Yap, M.J., Cortese, M.J., Hutchison, K.A., Kessler, B., Loftus, B., Neely, J.H., Nelson, D.L., Simpson, G.B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39, 445-459.

Cortese, M.J., & Fugett, A. (2004). Imageability Ratings for 3,000 Monosyllabic Words. Behavior Methods and Research, Instrumentation, & Computers, 36, 384-387.

Cortese, M.J., & Khanna, M.M. (2008). Age of Acquisition Ratings for 3,000 Monosyllabic Words. Behavior Research Methods, 40, 791-794.

Kuperman, V., Stadthagen-Gonzales, H., & Brysbaert, M. (in press). Age-of-acquisition ratings for 30 thousand English words. Behavior Research Methods.

This entry was posted in .Interim directions, getting started, rstats and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s