A critical reason for learning to use R is the superior capacity that affords to visualize data. If you learn to plot data with R you are learning to plot data using the best tools now available.
There are four graphics systems in R
We will mostly use one, but it is worth noting the existence of the others:
— base graphic system, installed with R, written by Ross Ihaka
— grid graphics system, written by Paul Murrell
— lattice graphics, written by Deepayan Sarkar
— ggplot2, written by Hadley Wickham (2009, recently revised)
To access ggplot2 functions, you will need to install then load the package.
What will we be doing with our plots?
We will need to plot data:
— to check our data, to look for missing values, errors, and evaluate the need to transform variables
— to understand our data better, exploratory data analysis (EDA), to detect outliers, trends and patterns that warrant further consideration
— to present model predictions, examine model appropriateness
— to report results
Mastering the grammar of ggplot2
The theoretical basis of ggplot2 is the layered grammar of graphics (Wickham, 2009), based on Wilkinson’s (2005) grammar of graphics. A plot can be understood as a combination of:
— a dataset
— mappings from variables to aesthetics – the graphic properties like point position, size, shape and colour
— one or more layers, each composed of a geometric object, a statistical transformation, and a position adjustment – objects like points, lines and bars, statistical transformations like that used to translate between the raw data and the line (smoother) shown to indicate the predicted values of y on x, given the data
— a scale for each aesthetic mapping – functions that convert the data values e.g. car weight or engine size – to pixel position, colour specification etc.
— a coordinate system – we might need to make a choice over whether to use Cartesian, polar, spherical etc. coordinates
— faceting specification – describing which variables should be used to split up the data e.g. into small multiples showing subsets of the data
N.B. ggplot2 takes dataframes as input, the specification of aesthetics, scales, statistical transformations and geometric objects result in the production of plots
What have we learnt?
— R has four different graphics systems
— ggplot2, the one we will mostly use, is based on a grammar of graphics
— plots are seen as the products of combining data with aesthetic mappings via scale, and layering with objects