Getting normative data for words

The easiest step is to get data we need from the English Lexicon Project (ELP). We will want information on frequency (there are several measures, we will be using the log context distinctiveness (CD) measure from the SUBTLEX database; see papers by Brysbaert, Adelman etc.), and orthographic similarity (we will be using OLD20; Yarkoni et al., 2008), in particular.

We need to open internet explorer (the ELP website does not seem to work well with chrome) and at:

http://elexicon.wustl.edu/WordStart.asp

click on:

Generate Lists of Items with Specific Lexical Characteristics

and then select for:

SUBTLWF

LgSUBTLWF

SUBTLCD

LgSUBTLCD

OLD

OLDF

PLD

PLDF

NPhon (Number of Phonemes)

NMorph (Number of Morphemes)

then click on the execute query button (at the bottom, for restricted vocabulary), then paste in a list of the words from the lexical decision stimulus set – any listing will do, one might be from:

item norms 100812.csv

We can then copy/paste the selected return from this query into a new sheet (word norms) created in a workbook where we gather together all the information sources for our analysis.

Getting data from databases without online interfaces

Using the ELP is relatively straightforward, getting the AoA and imageability ratings is a little harder. We are going to use the Cortese imageability and AoA norms, which I have copied into a new folder:

Dropbox\resources R\2013 R class\item norms data

We can get the values we need by looking for, and copying and pasting, values for each word by hand. Or we can let excel do the work, following the instructions given here:

http://crr.ugent.be/archives/833

— see the pdf how-to guide on vlookup

Having followed the instructions in the guide with respect to both the IMG and the AOA databases, a quick spot check shows both that the vlookup function seems to deliver norm values from the source databases accurately and that there are a few missing values.

We can use the Brookes online IMG ratings that I collected for these (and other words) to complete the IMG database.

We can use the Kuperman AOA norms to get an alternate (complete) set of AoA values for words in the stimulus set:

http://crr.ugent.be/archives/806

We use the 51k words database norms.

Getting normative data for words

Leave a comment Cancel reply

Browse by category

Tags

Follow Blog via Email

Recent Posts

Blogs I Follow

Getting normative data for words

Share this:

Related

Leave a comment Cancel reply

Browse by category

Tags

Follow Blog via Email

Recent Posts

Blogs I Follow