I have mentioned this before as well that I very strongly believe that the best way to learn something is to teach it and that is the whole and soul purpose of creating these posts.
Presently I am learning about tables in R. Frankly, I don't even know what frequency tables and contingency tables are. I will simply google about them and then type down the understanding I develop after researching on these topics. And hence I absolutely don't mind if you discern something unusual about the content and point that out to me. In fact, I will appreciate that.
So, what is a frequency table?
A frequency table is a table showing how many times a particular data value or combination of data values occur for the period during which the data was recorded.
And what are contingency tables?
This one I seriously don't know.Lets Google :P
Ok, Contingency tables have variables along the rows and as well as columns and each cell depicts the relationship between its corresponding row and column variables. The relationship can be frequency of the row column combination, it can be correlation or it can be something else
Ok. Knowledge score +25 points. Learning further.
How to create tables in R?
The first way i came across it through the table() function. If only a single object is given as input, this will create a frequency table and if multiple objects are given as input, this will create a contingency table
Lets take a simple example. Plot a frequency table of the different cylinder variants in mtcars dataset
>attach(mtcars)
>table(cyl)
This tells that there were 11 cars with 4 cylinders, 7 cars with 6 cylinders and 14 cars with 8 cylinders
By using prop.table() function we can display the frequencies as proportions -
>data<-table(cyl)
>prop.table(cyl)
The above result indicates that 34% of cars in data set have 4 cyllinders, 22% have 6 and 44% have 8
I hope this would have explained how to create a simple one way frequency table. Now, lets look at how to create a two way table using table function
Syntax is
table(row data, column data)
Creating a table between the "vs" column and "cyl" column of mtcars data set where "vs" will be along the rows and "cyl" along the columns
>table(vs,cyl)
The output has presented a frequency table for the various combinations of vs and cyl in the data set
There is another function to plot tables which is good to know. The xtabs() function. The syntax of xtabs() is as follows -
xtabs(~A+B+C...,data=dataSet)
where every thing following ~ are the variables for which we want to plot the data. The first one will always be row and second always the column, starting from third one the number of tables will keep on increasing. For example if the third as 2 categories, we will get two tables. First the contingency table of A and B with first category of C and second the contingency table of A and B with second category of C
data= will simple contain the name of the dataset
Example-
>xtabs(~vs+cyl,data=mtcars)
As you can see the advantage with xtabs() is that it names the dimension as well
At times we want to see the sum of rows and columns as well in the table. This can be done using the addmargins() function
But, something that makes the neatest tables is the CrossTable() function of the gmodels package. Have a look -
Pretty cool right?
I briefly discussed about multi dimensional tables above. Lets look at that as well -
1. Using the table function
>table(vs,cyl,gear)
As you can see that the number of tables are equal to the number of factors of the gear variable
We can't do multiple dimensions with CrossTable() though.
If you look at multiple dimension tables, it looks good but not that good. There is an amazing function to make them that good. The function is ftable() and lets see what it does -
Yep, it combined the seperate tables into one. This will be really helpful if we add more dimensions. Like -
Also, keep in mind that in case ftables() the rows start with the first input and columns are always the last input.
Notice that earlier cyl was along the columns, after ftable it is also along the rows.
With that I will conclude this post on tables. This post definitely helped me in improving my concepts with table. Hope it helped you as well.
Till then.
Stay Awesome


In the crosstable you created using gmodels, can you please explain what each of those numbers with the decimal places mean? I used gmodels to create a contingency table and the only argument I set to TRUE was prop.c (i.e., everything else was set to FALSE). I still got an extra number that was displayed along with the column percentages and the actual n value for the cell... and I can't for the life of me figure out what it is.Thanks!
ReplyDeleteHey,
ReplyDeleteThanks for asking that! Now we both would know the answer :) If you notice at the top of the result shown by crossTable, there is a small box which says Chi Square, Row Total, Column Total and table total. The decimal values in the tables are exactly that. If you do the math then all the second values along the rows add to 1, all the third values along the columns add to 1 and all the fourth values of the entire table add to 1. Hope that helps. Cheers!