Tuesday, 15 December 2015

Hack the t-tests

Alright, first thing that makes me curious about Student's t-test is why it is called so. The history is somewhat interesting.

First, why the "Student"?

"Student" is actually pseudonym or pen name of the author, Mr. William Sealy Gosset. He used a pen name because at the time he came up the concept he was not allowed to publish stuff under his own name by his employer. Some company with bad GTPW ratings it seems

Monday, 28 September 2015

How to do a Logistic Regression in R

Its a big post so don't give up! And by any chance if you feel like closing the window, please take the pain to scroll down to the bottom and comment "this post is awesome!"...ummm...Yeah.

So, getting down to business - 
Regression is the "I TOLD YA SO!" dude of Data science. So, yup, it predicts!

A safe definition would be -

Regression is the statistical technique that tries to explain the relationship between a dependent variable and one or more independent variables. There are various kinds of it like simple linear, multiple linear, polynomial, logistic, poisson etc

Blah!

Friday, 25 September 2015

How to get class of all the variables/columns of a data frame/matrix in R

Yo,

At times we need to know the class of the variables of a data set. Certain classes are not compatible with certain functions. They don't like each other. And certain classes result is wrong interpretation of the data. So, you get the point, you need to know something with which you can get the class.

So here is the classy approach -

1. Create a new object
2. Use sapply function with class function on the Data frame
3. Marry step 1 and 2.

Tada -

Monday, 7 September 2015

Basics to Probability Distribution in R

Hello there,

So, you wanna know about probability distributions with R eh?

Trust me I had to do quite some research before coming to this one. Some people can be like - *smirk* research on probability distribution? I knew that in my kindergarten. Well. I did not :(

So, what is a probability distribution function f(x)? The answer is not that simple and will depend on the type of x, whether x is categorical, discrete or continuous. Lets see -

Sunday, 6 September 2015

Tables in R

Yo, Wassup?

I have mentioned this before as well that I very strongly believe that the best way to learn something is to teach it and that is the whole and soul purpose of creating these posts.

Presently I am learning about tables in R. Frankly, I don't even know what frequency tables and contingency tables are. I will simply google about them and then type down the understanding I develop after researching on these topics. And hence I absolutely don't mind if you discern something unusual about the content and point that out to me. In fact, I will appreciate that.

Factor function in R

If you have a categorical data. And you want to do some analysis on it, you will be required to convert the categories into integers. Factor function is used to achieve this objective.

The syntax of factor function is as follows -

factor(data, order=T/F, levels=c(....))

where,

data can be any vector/column of a table which is having categorical data

If the data set is ordinal, then order needs to be set as True. By default it is False. This will allot integers to categories in ascending alphabetical order such that 1 will be considered better than 2 and so on.

The recoding of categories to integers is done in an alphabetical order. That is D gets preference over F. To change this you need to explicitly mention the order in levels argument

if level =c(D,BA) then D=1, B=2, A=3 else by default A =1.B=2,D=3





Wednesday, 2 September 2015

What is difference between correlation and covariance



Correlation is standardized form of covariance. That means you can compare correlation of two data sets having different units.

You can not compare covariance of two data sets that have different units.

Now, how to find correlation and covariance in R?


Friday, 28 August 2015

Data Visualization in R



Welcome, my friend, welcome!

I have just got what you are looking for. I have got  amazing codes and beautiful graphs and my oh my all in your favorite language - R :)


Enough drama, in this blog i am going to combine my previous posts on data visualization in R into one single post. Its going to be pretty long, so keep some snacks handy :)

Plotting multiple graphs in R with fine control


Wassup,

This one is my fourth post in the R graphical series, post this I will give graphs some pause. We have seen how to plot simple graphs in R, how to plot multiple graphs together and how to plot multiple graphs with more control. Now, we are going get even more sci fi. We are going to see how to superimpose one graph over the other. And this time again we gonna call our old friend, yep you got it the par() function.


Thursday, 27 August 2015

How to plot multiple graphs together in R - Season II


Ola,

In my last post I wrote about how to plot multiple graphs together in R. It was an absolutely amazing post, the best I have ever read. You should check it our -> I am that awesome post he is exaggerating about

Now, what if we want to control how many graphs should be there in each row or column? Exited ? Click read more