The purpose of this assignment is to write a statistical report based on the data set provided. The data represents information about 100 motion pictures and their characteristics. The following variables participated in data description:
Motion picture – the name of the film
Opening Gross Sales (in millions of the U.S. dollars)
Total Gross Sales (in millions of the U.S. dollars)
Weeks in Release
In this short report I decided to provide five numbers summary descriptive statistics. These numbers include minimum, maximum, 1st quartile, 3rd quartile and median. This statistics was calculated in Excel and given in a table below:
The lowest opening gross sales were only $70,000 – “Extremely Loud & Incredibly Close” picture. The lowest total gross sales were $29,140,000 – “Priest”. The least number of theaters where picture was demonstrated is 1038 theaters – “Midnight in Paris”. The shortest time of being in release was 6 weeks – “Priest”. 1st quartile represents the lowest 25% of the data. Hence, the lowest 25% of opening gross sales are for pictures below $13,002,500, the lowest 25% of total gross sales are for pictures below $39,957,500, the lowest 25% of number of theaters are below 2854 theaters and 25% shortest periods of release are for those pictures that were shorter than 11.75 weeks. Median represents the “middle” element of each variable. Thus, the “average” or “middle” picture was with $19.08 million opening gross sales, $72.4 million total gross sales, 3102.5 theaters and 14.5 weeks in release. The top (the highest 25%) of the pictures are represented by 3rd quartile and can be interpreted similarly. The highest opening sales were $169,190,000 – “Harry Potter and the Deathly Hallows Part 2”, the highest gross sales were $381,010,000 – the same picture, it was demonstrated in the biggest number of theaters (4,375). However, the longest duration of release was 43 weeks – “Midnight in Paris”.
In order to examine the relationship between total gross sales and each of the three variables, we calculate correlation matrix:
Pearson’s coefficient of correlation indicates very strong positive linear relationship between total gross sales and opening gross sales, strong positive linear relationship between total gross sales and number of theaters and weak positive association between weeks in release and total gross sales. It seems that opening gross sales is the best factor to predict the total sales and the success of a picture.
In order to determine outliers, “1.5 IQR” rule was used. IQR is the difference between 3rd and 1st quartiles. Those values, which are less than Q1 minus 1.5*IQR, are extremely low outliers and those, which are bigger than Q3 plus 1.5*IQR, are extremely high outliers (Taylor). The calculations are given below:
Pictures with opening sales higher than $59,877,500, with total sales higher than $202,676,250 or with the duration of release longer than 24.875 weeks are considered as extremely high outliers. Pictures with number of theaters less than 1810 are considered as extremely low outliers.
Works Cited
Taylor, Courtney. "What Is The Interquartile Range Rule?". About.com Education. N.p., 2016. Web. 5 Feb. 2016.