Business Class Capstone Project
Introduction
In this paper, I will describe and discuss the application of statistical techniques learned in class to a real-world problem. According to the instructions of the assignment, I am a chief analyst for a medium-sized company. I decided that my company is a medium sized diamond store. An executive gave me a data set of diamond sales and asked me to explore this data. The executive wants to know how this data can be used to improve the business.
The data will be analyzed by the means of various statistical tools. I will start with the descriptive statistics and charts, explaining the main features and characteristics of the data. Then, the confidence interval for diamond prices will be calculated. Finally, a hypothesis test will be performed. In conclusion, I will write a brief summary of the paper and give recommendations to the chief executive for business improvement.
Our diamond company wants to enter a new market. In order to compete successfully with other jewelry stores in this market, it is necessary to examine the pricing policy of competitors. The chief executive of our company has collected a data set of 308 sales of diamonds in the similar medium-sized jewelry stores in this market. This data was retrieved from the Dr. John Rasp statistical website ("Dr. John Rasp's Statistics Website - Data Sets For Classroom Use"). The data represents the information of diamonds sold, such as price, color, clarity and rating. According to the data description, the following variables participate in our research:
IDNO – an identification number of each diamond
Weight – weight of the diamond, in carats
Color – a measure of color purity (D represents the purest color, lesser grades are E, F and so on, in the descending order of the alphabet)
Clarity – clarity of a diamond (depending on the absence or presents of minute flaws). “IF” (internally flawless) is the top grade, “VVS1” and “VVS2” mean very very slightly imperfect, “VS1” and “VS2” mean very slightly imperfect
Rater – a rating agency that assigned ratings to a diamond. There are three agencies, which evaluated these diamonds: GIA – Gemmological Institute of America, IGI – International Gemmological Institute, HRD – Hoge Raad Voor Diamant.
Price – the price of diamonds, in Singapore dollars
Methods
The first step is descriptive statistics. Descriptive statistics is used in any study, which involved quantitative indicators: for example, in business, social science or medicine. The task of descriptive statistics is the systematization and visual representation of primary data obtained experimentally or by observing. In business, statistics are used everywhere, from the calculation of staff salaries to analysis of the popularity of a brand in a market. The analytics is based on the elementary concepts of the descriptive statistics. Any analysis begins with a sample – the defined array of values, which includes the information about the parameters of interest.
The average value is expressed by a median, mode or the arithmetic mean. Percentiles and quartiles divide the whole sample into parts and determine what percentage of all values is below or above a certain value. Range is the difference between the largest and the smallest sample values. It specifies a range of values. Variance and standard deviation are measures, which define the difference between a specific value and the average value of the sample.
The next procedure I will use is an interval estimation of the parameter of interest (confidence interval). Point estimates are approximate, since they indicate a point on the real axis, which should be the value of the unknown parameter. However, this estimation is an approximate value of the parameter of the total population, which takes different values at different samples of the same size. So, in some problems, researchers want to find not only a good point estimation, but also to determine its accuracy and reliability.
There are two terms to do this in statistics: confidence interval and confidence level. A confidence interval is the interval for the values of parameter of the total population, which includes the true population value of the parameter with a given level of confidence. In this paper, I will calculate the confidence interval for the mean value of diamond prices.
The final step of this research paper is to perform a hypothesis test. A statistical hypothesis is a statement regarding the unknown parameter of the total population based on a sample survey. Any conclusion derived from statistical observation / research / analysis is based on a finite number of observations. So, it is not complete and may not be reliable. It is necessary to make a conclusion of the study (the hypothesis testing results) at a given level of statistical reliability. Reliability is directly related to the representativeness of the sample - how confident is the data obtained from a sample provide a glimpse of the relevant parameters of the total population. Reliability is determined by how likely the association detected in the selected sample will be confirmed (re-discovered) in another sample of the same population.
Results
The following tables represents descriptive statistics of the data ("Descriptive Statistics In Excel"):
On average, the price of the diamonds sold is $5,019.48 with a standard deviation of $3,403.12. The middle value or the 50th percentile of the data is the median value of $4,215. Most frequently, the price of the diamonds sold is $5,122. The values of kurtosis and skewness are close to the parameters of normal distribution. The cheapest diamond sold had a price of $638 and the most expensive was sold for $16,008.
On average, the weight of the diamonds sold is 0.630909 carats with a standard deviation of 0.277183 carats. The middle value or the 50th percentile of the data is the median value of 0.62 carats. Most frequently, the weight of the diamonds sold is 1 carat. The smallest weight of the diamonds sold is 0.18 carats and the biggest diamond had a weight of 1.1 carats.
The frequency tables represent the amount of diamonds in each category of such characteristics as color, clarity and rating agency.
I calculate the 95% confidence interval for the mean price of the diamonds sold. The formula for the 95% confidence interval is as follows ("Confidence Intervals"):
x±t*sn
Here, x-bar is the sample mean, t* is the critical value for the t-distribution with (n-1) degrees of freedom, s is the sample standard deviation and n is the sample size. I use the Excel function CONFIDENCE and calculate this confidence interval. According to the output, the confidence interval is (4639.43, 5399.54). This interval can be interpreted in the following way: I am 95% confident that the true population average of diamond prices sold in this market is between $4,639.43 and $5,399.54.
The final procedure I carry out is a hypothesis test. I want to answer the following research question: Is the price of diamonds associated with their weights? The null hypothesis is: there is no significant correlation between the prices and weights of the diamonds. The alternative hypothesis is: there is a significant correlation between the prices and weights of the diamonds. I set the level of significance alpha at 5% and calculate the correlation coefficient for the pair of variables Weight and Price. According to the output, the coefficient of correlation is equal to 0.944727. This value is an evidence of a strong and positive linear association between weight and price (Rumsey). According to the tables of the critical r-values, the critical value of r is r(307, 0.05) = 0.113 ("Table Of Critical Values: Pearson Correlation - Statistics Solutions"). Since the observed value is higher that the critical value, the coefficient of correlation is significant.
Conclusion
In this paper, I, as a chief analyst for a medium-sized jewelry store, have applied statistical techniques to a real statistical data set. I have examined the data of the diamonds sold on the new market and made the following conclusions:
On average, customers buy diamonds with the prices from $4,639.43 to $5,399.54.
The average weight of the diamonds sold on this market is approximately 0.62-0.63 carats.
The weight of the diamonds is very significantly associated with their price. The heavier diamonds are more expensive than the light ones.
These conclusions can help our chief executive to formulate an appropriate strategy of business performance for a successive entrance in the new market. The further researches can be aimed at the studying of the characteristics of the relationship between price and other factors. For example, it will be useful to develop a multiple regression equation for the diamonds prices prediction based on their weights, color, clarity, etc.
Works Cited
"Confidence Intervals". Stat.yale.edu. N.p., 2016. Web. 1 Aug. 2016.
"Descriptive Statistics In Excel". Excel-easy.com. N.p., 2016. Web. 1 Aug. 2016.
"Dr. John Rasp's Statistics Website - Data Sets For Classroom Use". Www2.stetson.edu. N.p., 2016. Web. 1 Aug. 2016.
Rumsey, Deborah. "How To Interpret A Correlation Coefficient R - For Dummies". Dummies.com. N.p., 2016. Web. 1 Aug. 2016.
"Table Of Critical Values: Pearson Correlation - Statistics Solutions". Statistics Solutions. N.p., 2016. Web. 1 Aug. 2016.