The easiest step is to get data we need from the English Lexicon Project (ELP). We will want information on frequency (there are several measures, we will be using the log context distinctiveness (CD) measure from the SUBTLEX database; see papers by Brysbaert, Adelman etc.), and orthographic similarity (we will be using OLD20; Yarkoni et al., 2008), in particular.
We need to open internet explorer (the ELP website does not seem to work well with chrome) and at:
Generate Lists of Items with Specific Lexical Characteristics
and then select for:
NPhon (Number of Phonemes)
NMorph (Number of Morphemes)
then click on the execute query button (at the bottom, for restricted vocabulary), then paste in a list of the words from the lexical decision stimulus set – any listing will do, one might be from:
item norms 100812.csv
We can then copy/paste the selected return from this query into a new sheet (word norms) created in a workbook where we gather together all the information sources for our analysis.
Getting data from databases without online interfaces
Using the ELP is relatively straightforward, getting the AoA and imageability ratings is a little harder. We are going to use the Cortese imageability and AoA norms, which I have copied into a new folder:
Dropbox\resources R\2013 R class\item norms data
We can get the values we need by looking for, and copying and pasting, values for each word by hand. Or we can let excel do the work, following the instructions given here:
— see the pdf how-to guide on vlookup
Having followed the instructions in the guide with respect to both the IMG and the AOA databases, a quick spot check shows both that the vlookup function seems to deliver norm values from the source databases accurately and that there are a few missing values.
We can use the Brookes online IMG ratings that I collected for these (and other words) to complete the IMG database.
We can use the Kuperman AOA norms to get an alternate (complete) set of AoA values for words in the stimulus set:
We use the 51k words database norms.