- WidegeCorp and Company W are merging. WidegeCorp is an organization that utilizes statistical analysis for decision making while Company W is the complete opposite, relying on experience to make business decisions and strategies.
- Currently, Company W is testing new sales software. The software was given to 250 of the total 500 sales personnel of the company in the last three months.
- There are four sales regions: Northeast, Southeast, Central and West.
- The company believes that each sales person will be able to sell the same amount of products whether or not they were given the sales software to manage their contracts.
Requirement of the case:
- The VP of Sales at WidgeCorp has requested the possible null and alternative hypotheses for a non-parametric test on this data. A chi-square distribution is suggested to be used.
Rationale for using the Chi-Square Distribution
It is ideal to use the Chi-Square analysis for this particular case. The Chi-Square analysis is utilized for examining the statistical behaviour of two or more populations (variables) that have similar characteristics. Unlike other statistical tests, the Chi-Square analysis can evaluate more than one variable at a given time, which makes it ideal for looking at qualitative or categorical data (this includes age, gender, race, regions, groups, etc.). Determining the characteristics of categorical data goes beyond looking at its mean distribution (averages) thus the Chi-Square is used.
Using the Chi-Square Distribution
The Chi-Square Distribution utilizes the following information:
- Categorical data - is defined as a grouping of data according to a common trait or characteristics. This grouping is done to show the relative frequencies observed within each category. In the case of Company W, the categories are:
- Sales person with new sales program
- Sales person without new sales program
In each category, we can subcategorize according to the regions: Northeast, Southeast, Central and West.
- Expected frequencies – for the Chi-Square test, it is important that the frequencies observed per category is greater than five. The data set provided by Company W is sufficient since there are 250 samples from both categories. Within the sub-categories, the number of samples should still be sufficient since there are only four sub-categories.
The Chi-Square Test will determine if the distribution between those that receive that sales program and those that did not receive the sales program are “identical”. By identical, we infer that there is no significant difference between the sales performance of those without the sales program from those that were given the new program and we could thus conclude whether or not to fully deploy this new sales tactic. The data population for those without the new sales program is expected to be a normally distributed population.
The Chi-Square Test will also help in determining if the two categories have similar characteristics between the categories (this is determined by the goodness-of-fit test), whether there is a relationship between the two categories (this is determined by the test-of-independence) or if the two categories are identical based on a similar category (as determined by the test-of-homogeneity).
- The frequencies should also be determined. These are the observed frequencies (denoted as O) and the known frequencies (denoted as E). The observed frequencies are the proportions of the frequencies from the total observations. The expected frequencies are the known frequencies of the data within each category and are usually acquired through historical data.- The necessary statistics required for conducting the Chi-Square Test which include the following:
- Degrees of freedom (denoted as df) - a number that is related to the size of the categories. The (df) can be looked up on a standard Chi-Square values distribution table.
- Goodness of fit test – is calculated as df = k-1. The number of groups is denoted as (k).
- The metric that tests independence and homogeneity are determined by the formula df = (r-1)(c-1). In this equation, r is for the number of rows and c is for the number of columns.
Developing the Null Hypothesis and Alternate Hypothesis:
After determining these statistical metrics, the null hypothesis must be formulated to begin the Chi-Square Test. Usually, the null hypothesis is denoted by the symbol (Ho) and is often attached to a historical value or claim. An alternate hypothesis is a hypothesis that is different from the null hypothesis. In the course of the statistical test, the alternate hypothesis is accepted if the null hypothesis is found to be untrue and is rejected by the statistical tester. Because of the complexity of the test, it is often encountered that mistakes or errors are made. One error, called the Type 1 error, happens when the null hypothesis (Ho) is rejected when in fact, the hypothesis is true. Another error is the Type 2 error wherein the null hypothesis (Ho) is accepted when in fact, the hypothesis is false. Increasing the sample size reduces the risk of conducting errors, but still errors may occur. The researcher will be willing to take a risk, i.e. that of accepting a Type 1 error and this risk is determined using the “level of significance” test. Level of significance is denoted by the Greek letter alpha while the Greek letter beta is for determining the risk of accepting Type 2 errors. The rule of thumb is that we want alpha and beta to be as small as possible (the smaller the risk the better) and we choose to estimate alpha first. If alpha is say .05, we are saying that in 100 statistical tests, we reject Ho when it is in fact true, 5 out 100 times. This risk is small and is statistically viable.
Formulating the Null and Alternate Hypothesis for Company W:
The steps to formulate and conduct the Chi-Square Test are:
- Establish the hypotheses
Ho:
- The performance of sales personnel with the new sales program is the same as the sales performance of the personnel without the new sales program
- O (Sales of personnel with new sales program) = E (Sales of personnel without new sales program)
Ha:
- Ho is not true
- Determine the significance level that will be utilized in the test
- Determine the degrees of freedom using the Chi-Square table
- Determine the value of E for the categories in question
- Calculate the required statistical values
If the calculations show that the sales of those with the new program are different from the sales of those without the new program, then the null hypothesis is rejected. If otherwise, then it is accepted. In terms of business implications, accepting the null hypothesis would conclude the ineffectiveness of the new program while rejecting the null hypothesis means that the new program would help improve the performance of the sales personnel.
References:
AAECT (2013). The Chi-Square Distribution. Retrieved from http://www.aaec.ttu.edu/faculty/eelam/3401/CourseMaterials/Notes_Fall07/Notes_Chi-Square.pdf
Slideshare (2013). How to Use Chi-Square Test. Retrieved from http://www.slideshare.net/mhsgeography/chi-square-worked-example