Over the holidays I invested some time in learning R and RStudio, something I have wanted to do for some time now. The opportunity presented itself when needing to help a family member run chi-square tests, something more involved in Excel than I would have guessed (Excel requires you to manually calculate the expected values table). The experience learning R has been fun but there are occasional head-scratching frustrations. As I go through the learning process I’ll be posting here mainly as a reference for me so I can remember how I solved various problems, but hopefully it will provide help to others who have hit stumbling blocks.
One issues I had a problem with was with the barplot function. Take for example this code that contains 100 observations (we’ll say it’s the number of times a person visits a particular store in a month), produces a table from these data, and then creates a barplot from the table.
visits <- c(2, 2, 3, 7, 10, 3, 1, 0, 8, 7, 11, 14, 1, 3, 0, 8, 9, 4, 3, 3, 20, 9, 2, 5, 12, 3, 6, 1, 1, 2, 4, 2, 2, 3, 9, 13, 7 ,11, 15, 15, 19, 7, 8, 7, 6, 0, 6, 0, 1, 4, 2, 9, 0, 6, 12, 7, 6, 14, 5, 0, 4, 0, 0, 8, 0, 4, 4, 1, 3, 5, 6, 15, 1, 6, 13, 2, 1, 3, 5, 3, 19, 12, 3, 0, 7, 0, 2, 4, 2, 2, 2, 5, 1, 4, 3, 0, 6, 11, 0, 3) visits.table <- table (visits) barplot(visits.table)
This is the resulting chart.
At first I thought I had a nice looking bar chart, only to notice that, while the data had a maximum frequency of 13, the y-axis cut short at 12. Also when the x-axis legend numbers become double digits they become too wide, leading R to begin skipping every other one. (A low tech way to fix this is widen the plot window.)
The y-axis problem can be solved by manually defining an axis. I tried to set the upper bound to 13, my largest count of observations, but had no luck. Only after setting the maximum to 14 did I have success.
ylim=c(0,14)
And the x-axis label font can be made smaller using:
cex.names =.8
For a final command:
barplot(visits.table, cex.names =.8, ylim=c(0,14))
Which produces this chart:
Better. It wish that, without any tinkering, R would produce a y-axis that extends to the maximum y-axis length.
My next challenge to figure was to show the null values as well (e.g. 16, 17, 18) rather than only values that have data. That can be fixed with “factor”. Example code, that creates a table and produces the barplot in one shot, rather than producing the table first:
barplot(table(factor(visits, levels=1:20)),cex.names =.8, ylim=c(0,14))
Side note, if you are on Mac OS X and having trouble getting an SVG file to print, try this link.