R for Machine Learning – Series 2

Today, I am going to explain an independent two-sample t-test and a one-sample rate t-test. As you know, according to the number of samples, results of normality tests and homogeneities, we need to decide which hypothesis tests should be used.

In the previous article, firstly, I explained about a one-sample t-test. The second one is a one-sample rate test.

  • One-Sample Rate T-test: It is used when a rate expression is wanted to be tested. Data must be continuous variables and all other requirements are the same as a one-sample t-test.
  • Independent Two-Sample T-test: It used to determine whether there is a statistically significant difference between the means in two unrelated groups. In this test, we use again Null and Alternative Hypotheses. A null hypothesis shows equality between the population means whereas an alternative hypothesis shows the differences between them. In order to decide we need to focus on a p-value (<0.05 or >0.05) that allows us to either reject or accept the alternative hypothesis.

R codes:

Let’s assume we have a data frame as below;

Normally, everyone wants to analyze them and get results. Yes, it is possible but it is undeniable fact is that we need to understand what data we handle. Therefore, we need to see what is the value of the mean, standard deviation, and etc. We can use this code (First, highly recommend installing “funModeling” package);

As shown in data frame code, we have two separate columns as A and B, and their values are listed like;

In this part of the test, I will focus on the visualization of normality but the problem is the way of presenting values. In order to prevent this issue, we need to gather A and B in the same column.

Normality test is significant and for example in order to decide on a parametric or non-parametric test, first the normality test suppose to be checked. Here, I will show you how can we see whether it is normal distributes by a histogram.

It shows that graphs are normally distributed.

The other important step is the homogeneity test and Levene’s test is used to test the null hypothesis that the two population variances are equal.

The result of it;

According to an interpretation of the result, a p-value is bigger than 0.05 and which means that the Null hypothesis is accepted(variances are equal). What if it was the opposite? Well, R decides to use which test is necessary, we do not need to take an extra action.

The last step is the hypothesis test. Hypothesis test decides whether there are significant differences between the mean.

The result comes with Welch Two Sample t-test because R thinks that there might be a missing step for the homogeneity test. According to the result of the p-value, the alternative hypothesis is accepted. It means that there are differences between the means.

If the normality test assumption is not met, the alternative test is the Mann-Whitney U-Test.

Summary

Assumptions;

  • Random samples
  • Independent observations
  • The population of each group is normally distributed.
  • The population variances are equal.

This test is an inferential statistical test and those specifications listed above are the main necessities for it.

References

https://sites.utexas.edu/sos/guided/inferential/numeric/onecat/2-groups/independent/two-sample-t/

https://www.udemy.com/course/veri-bilimi-ve-makine-ogrenmesi-egitimi/learn/lecture/13226780?start=0#questions

Leave a Reply

Your email address will not be published. Required fields are marked *