Monday, 7 September 2015

Basics to Probability Distribution in R

Hello there,

So, you wanna know about probability distributions with R eh?

Trust me I had to do quite some research before coming to this one. Some people can be like - *smirk* research on probability distribution? I knew that in my kindergarten. Well. I did not :(

So, what is a probability distribution function f(x)? The answer is not that simple and will depend on the type of x, whether x is categorical, discrete or continuous. Lets see -

If x is categorical then f(x) is called categorical distribution
If x is discrete then f(x) is called probability mass function (PMF)
If x is continuous then f(x) is called probability density function (PDF)

The probability of finding exact value of x when variable x is discrete can have some non zero value. But the probability of finding exact value of x when variable x is continuous is 0. In fact PDF for a particular value of x is probability of finding something close to that value and not exactly that value!
Phew!

Now, most of the times when you will be dealing probability distribution of a continuous variable , you will be required to find something that is to do with the area between certain points in the distribution.

Since, probability can never be greater than 1, the total area under the probability distribution curve is equal to 1 and can never be greater than that.

When Discussing R programming , the three probability functions that you need to keep in mind are as follows -

1. dnorm()
2. pnorm()
3. qnorm()

Notice the "norm", it stands for normal distribution.

Lets start with dnorm(). The syntax is as follows -

dnorm(x, mean, standard deviation)

if you only give x without specifying the mean and standard deviation it assumes that x belongs to a variable with mean 0  and sd 1

dnorm returns the value on PDF that is it will return probability for not exact value of x but something close to the provided value

pnorm()

pnorm(x, mean, sd, lower.tail)

pnorm returns probability of finding ALL values less than x. This is integration of PDF and is called cumulative distribution function (CDF)

if we put lower.tail = FALSE, it will return probability of finding all numbers higher than x

sweet! Now, qnorm()

Understand it like this if you do as follows -

>qnorm(pnorm(x))

The output will be x. Qnorm is Quantile function

When talking about random variables there is another one I would like to discuss. The rnorm()

rnorm(n, mean, sd)

This one simply returns a random number from a set of n values with given mean and standard deviation.

Thats it.

I hope this was helpful.

Now, please provide your comments on the content. I seriously need help on how to improve  my posts.

Till then,

Stay Awesome!



No comments:

Post a Comment