Module 9 : Visualization
This week we will be exploring some of the different visualization options available in R. We will mainly be comparing the built in R plot, lattice package and ggplot2 package. We will discuss the difference syntax and abilities of each method.
Let us begin!
And just like that we basically have a graph plotted for us. What is interesting about Lattice is that each graph is an object in itself, and we can directly modify or create plots how we want them by directly changing the arguments in the function. For instance I want to highlight the different sexes in the data and create a regression line for each I can do so with the following line of code:
Here we are creating an empty graph with ggplot(), and adding the points with geom_point().This is very different from the other plot functions we've used before. But as we will see later gives us a lot of room of flexibility with how we want to present the data.
The modular nature of ggplot2 also makes adjusting and playing around with graphics very easy to use. Since each aspect of the graph can be manipulated individually it also presents what the code is doing in a clear manner and allow for more fine tuning.
Our assignment was to find a dataset from the following link :
https://vincentarelbundock.github.io/Rdatasets/datasets.html
https://vincentarelbundock.github.io/Rdatasets/datasets.html
and using the three different visualization methods listed above create some visualization with the data.
Let us begin!
The dataset I decided to use was the Wong dataset.
https://vincentarelbundock.github.io/Rdatasets/doc/carData/Wong.html
https://vincentarelbundock.github.io/Rdatasets/doc/carData/Wong.html
This dataset looks at IQ of patients after being in a coma.
I downloaded the csv file into the directory that is the same as where R is installed and ran the following line of code to input the data into R:
> wong<- read.table("Wong.CSV",header = TRUE,sep = ",")
Now that we have our data set up and running in R we can begin with the visualizations.
Basic R graphics:
R has built-in functions for us to do some visualization, the main one being the plot() function.
For this assignment, I will be visualizing the relationship between the performance IQ(piq) of patients in relation to their visual IQ(viq).
>plot(wong$piq,wong$viq,
main = "Relationship between performance IQ and verbal IQ of coma patients",
xlab = "Performance IQ",
ylab = "Visual IQ")
The first two arguments of the plot function we tell the function what variables of the data set we want plotted. Main, allow us to set up a title for our graph. xlab and ylab allows us to label the corresponding axis. Here is the result:
While the graph we plotted is nice, it doesn't really tell us anything about our data, so next we will add a regression line. In order to do this we need to first calculate the regression relation of the data and then add that line to our existing graph.
>lmiq <- lm(wong$viq~wong$piq)
>abline(lmiq,col=2,lwd=3)
I decided to make the line red to stand out with the col = 2 argument in the abline(),by default everything in the basic R package is black and white. Another odd thing is that when working with the plot function, commands like abline() permanently edits the graph we are viewing. We can have more than 1 plot but we must make sure R knows which plot we want edited. It makes it difficult to make changes and play around. Overall it seems that the basic plot function of R is more of a bare bones visual tool that is good for just "getting the job done".
Lattice Package:
Next we will be working with the Lattice package.
> library(lattice)
> library(lattice)
>xyplot(viq~piq, wong)
And just like that we basically have a graph plotted for us. What is interesting about Lattice is that each graph is an object in itself, and we can directly modify or create plots how we want them by directly changing the arguments in the function. For instance I want to highlight the different sexes in the data and create a regression line for each I can do so with the following line of code:
>xyplot(viq~piq, wong,
grid = TRUE,
groups = sex, auto.key = TRUE,
type = c("p","r"), lwd= 3)
We can even even split the male and female very easily in one graph:
>xyplot(viq~piq|sex, wong,
grid = TRUE,
groups = sex, auto.key = TRUE,
type = c("p","smooth"), lwd= 3)
We can see that the Lattice package has a lot more we can do compare to base R. We have more control and since each plot is a stand alone object we can edit and play around with the visualization much for freely than the generic R plot.
To really highlight how much more we can do with Lattice, we even found a way to produce smooth scatter plots with just a simple argument change:
>xyplot(viq~piq|sex, wong, #create a smooth scatter
grid = TRUE,
panel = panel.smoothScatter )
ggplot2 Package:
ggplot2 is unique from the others by having the most divergent syntax.
With ggplot2 each aspect of the graph is its stand alone function we can fine tune specific characteristics with arguments in each function.
>library(ggplot2)
>ourplot <- ggplot(wong,aes(piq,viq)) + geom_point()
To create a regression line we can use the following line of code:
>ourplot + geom_smooth(method = 'lm')
We can then separate them by sex by simply adjusting the geom_point():
>ourplot + geom_point(aes(col=sex)) + geom_smooth( method = "lm",se = FALSE, aes(group = sex, color = sex))
Note that even though we have a geom_point() function in our original ourplot, ggplots2 knows to use our new geom_point(), this makes makes customizing very easy to use!
One thing to note is the high fidelity in ggplot2, we see that the graphical representation look very clear and define. Take a look at the following example:
>ourplot + geom_point(aes(col=sex)) + geom_smooth( method = "lm",se = FALSE, aes(group = sex, color = sex)) + facet_grid(~sex)
Conclusion:
While we mostly played around with different scatter plots and only touched the surface of different visualization with each method, we can see that the methods have a lot of pros and cons. Personally I found Lattice to be the easiest to learn right away, this might be because the syntax style is more similar to how base R works. But we see with ggplot2 how much more control we have once we learn the mechanics of the package.
I can also imagine ggplot2's unique modular syntax style allowing for easy manipulation of presentation of data. For instance if in our package we want to add a regression line, with Lattice we must create another xyplot() with the specific group argument specified, but with ggplot2 we can simply have the addition of facet_grid() function added to our existing ggplot(), which from a logistical standpoint is much more friendlier.
Overall I learned a lot from this module and I'm excited to dig in deeper to what more visualization capabilities R has.
-Anthony
Comments
Post a Comment