In this paper, we will discuss and describe the application of statistics and probability theory to a real world problem. The purpose of this essay is to demonstrate various statistical skills and retrieve clear and useful information from the Springdale Shopping Survey data. We are working only with variables from 18 to 25. These variables reflect the importance of eight attributes of the customers’ choice in the chosen shopping area. The following variables participate in our research:
Easy to return or exchange goods
High quality of goods
Low prices of goods
Good variety of sizes or styles
The sales staff is helpful or friendly
Convenient shopping hours
Clean stores and surroundings
A lot of bargain sales
Each variable is measured on a scale from 1 to 7, indicating the attitude to the attribute (1 – not very important, 7 – very important). There are a number of should we perform for these variables. These operations are distinguished in problems.
Problem #1
Calculate descriptive statistics (five number summary, mean, mode, range and standard deviation) for each variable and give an interpretation.
The descriptive statistics are calculated in Excel. The results are given in the table below:
On average, respondents say that ease to return or exchange goods is quite important in this shopping area (Mean = 4.91, Median = 5). The high quality of goods is a very important factor in this shopping area (Mean = 5.67, Median = 7). Low prices is also a very important factor (Mean = 5.63, Median = 7). The average attitude to the good variety of sizes and styles is considered as quite important (Mean = 4.97, Median = 5). It is quite important for sales staff to be helpful and friendly (Mean = 4.75, Median = 5). The convenience of shopping hours is an important factor (Mean = 4.81, Median = 5). Clean Stores and surroundings is also important (Mean = 4.98, Median = 5). A lot of bargain sales is an important factor in this shopping area (Mean = 5.01, Median = 5.5).
In each question, there were respondents who believe that this factor is not important in this shopping area (minimum values are all equal to 1). Also, there always were some respondents who think that the factor is very important (maximum values are all equal to 7). The first quartile represents the lowest 25% scores in the sample. The third quartile reflects the lowest 75% of scores. Standard deviations indicates the variability. The most dispersed answers are given for variable #18 (SD = 2.07), the least dispersed is data of variable #24 (SD = 1.77).
Problem #2
Are there any data points for any of the variables that can be considered outliers? If there are any outliers in any variable, please list them and state for which variable they are an outlier. Use the z-score method to determine any outliers for this question.
We use z-score method to determine outliers for each variable. Compute z-score for each value and each variable according to the formula below:
zi=xi-xs
As we know, any z-score less than -3 or greater than 3 indicates an outlier. The calculation of z-scores is given in Excel. The calculations did not indicate any outlier in the data.
Problem #3
Based on the results for question 1, which attributes seem to be the most important and the least important in respondents’ choice of a shopping area? Which items from #1 did you use to decide on the least and most important attributes, and why?
We should look on the measures of central tendency (mean, median and mode) and measures of variability (standard deviation, range). The most important factors are those which have the highest central tendency and the lowest variability. Thus, the most important attributes are the high quality of goods (Median = 7, SD = 1.90) and low prices (Median = 7, SD = 1.93).The least important attribute is the helpfulness of a sales staff (mean value is the least, 4.75, SD = 1.91).
Problem #4
Compute correlation coefficients between variable #19 and variables #21-25. The results are given in the following table:
The Pearson’s coefficients of correlation indicated a moderate positive linear relationship between IMPQUALI and all other variables, because r-value is between 0.3 and 0.5 for all considered pairs. This means that there is a moderate positive relationship between the scores of high quality attribute and all other attributes (good variety of sizes/styles, sales staff helpfulness, shopping hour conveniences, clean stores and bargain sales). Respondents who gave high scores to the high quality of goods attribute are tend to give high scores to all listed attributes.