Modelling – some conceptual foundations

We have discussed the relationships between pairs of variables, we will now move on to analyzing our data using linear regression.

Slides on regression can be downloaded here.

You will see in those slides that I rely very heavily on Cohen, Cohen, Aiken & West (2003). Note that Jacob Cohen had an instrumental role in popularizing multiple regression in the psychological sciences.

Up to this point, we have been examining the way in which two variables vary together. We have previously plotted and considered scatterplots (e.g. here), which are a method for examining the relationship between two variables. I have, more or less, been supposing or implying that the way one  variable varied (e.g. subject reading ability) relied on or was dependent upon the way other variables varied (e.g. age or reading experience). In fact, I have been implying that e.g. variation in age or reading experience caused variation in reading skill: almost saying that people who read more are better readers (because they read more).

It is worth me stopping being so loose in my language:

1. I do not know if e.g. people who read more are more skilled because they read more. In fact, it could well be the other way around (or a reciprocal relationship, e.g. Stanovich, 1986).

Causality is off the table altogether, in what we are doing at this stage.

If you consider the examination of relationships among item attributes and orthographic neighbourhood size (here), then it does not make much sense to think that a word being long somehow caused it to have few neighbours.

N.B. It might be worth considering, though, if words that are short, frequent, easy to say, and   are the names of concrete objects will be the kinds of words we would say to children and that they would learn early in life – or if learning needs condition how we speak to children in a way that allows easy learning – it gets a bit confusing.

2. More pertinantly, for right now, I moved from looking at the relationship between pairs of variables in terms of scatterplots (with lines of best fit in them, i.e. regression predictions) straight into talking about correlations, here, without acknowledging that:

— if i compute a correlation coefficient, say, Pearson’s r = – 0.61 for the correlation between word length and word neighbourhood size, I am not saying anything about the direction of the relationship;

— in Cohen et al.’s (2003) words, I am treating both variables as if they are of equal status;

— however, in discussing the scatterplots, and in getting into regression analysis, I am pursuing an interest in whether one variable depends upon another (or others).

Correlation is symmetrical

Regression is asymmetric.

We are interested in whether the outcome or dependent variable is affected or explained or predicted by variation in the predictor or independent variables.


Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioural sciences (3rd. edition). Mahwah, NJ: Lawrence Erlbaum Associates.

This entry was posted in 15. Modelling - regression, modelling and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s