Introduction
In this paper we will perform data analysis to show how basic tools of statistics and probability theory are applied to a real world problem. Our goal is to help a business manager to understand the statistical data collected and to provide meaningful explanation of the results. In this paper we use a concept of descriptive statistics (including measures of variation and measures of central tendency), consider probability distribution of the variables, calculate confidence interval for the population mean of some variables and apply a statistical test. In addition, we will use regression analysis technique to describe the association between dependent variable and independent variables and give a conclusion of the whole research.
Problem Statement
One of the most important areas of use of mathematical methods in economic research is the analysis of statistical data. The last decades of research in mathematics increased the possibility of solving economic problems. Many new methods to solve economic problems, based on the accumulation of empirical data. Mathematical methods help to identify the relationship between economic factors and to express this relationship analytically. The main methods of quantitative analysis allow to build analytical (econometric) model based on statistical information that allow us to make a prediction about the value of the result variable for a given set of input variables.
In the statistical analysis of economic problems in the first place the hypothesis of a link or relationship between economic variables. Subsequently, for statistical analysis, this hypothesis is to be checked. On the basis of the received level of significance of the results (or econometric model) can be accepted or rejected. If, for any reason, the result turned out to be insignificant, it does not represent any economic or practical interest.
Since in most cases, the solution of economic problems the initial data are the statistics, it is important to use the appropriate tools to analyze the data. This explains the considerable importance of the application of statistical methods in economics. If a built econometric model is adequate and effective for the forecasts, it is a good mathematical tool to use.
In this paper we will consider a real world economic problem related to the area of tourism. Our purpose is to help a tourism manager to understand statistical information collected by a survey and make a decision based on the given data.
Summary and Explanation of Descriptive Statistics
According to the conditions of the assignment, we are given with the data set of 50 observations. This is a sample of 50 accommodation providers which have responded to a survey about the guests staying on their premises on an October mid-week night. The data is characterized by a number of parameters (variables). These variables are described in a list below:
Category – a type of accommodation (two options possible: a hotel or a B&B (Bed and breakfast)
Star Rating – star rating of a hotel or B&B (from 2 to 5)
Length of Stay for current guests – a number of days (period) ordered by guests to live in the accommodation
Revenue on sample night – total revenue of the accommodation in the considered October mid-week night.
Average Time spent by guest in accommodation (hrs) – an average number of hours spent by guest in the accommodation.
For the further analysis we decide to divide the data by a discrete random variable called Category. We are interested to explore the major differences in statistics between hotel accommodations and B&B accommodations. That’s why we begin our research with a descriptive statistics separately for two types of accommodations. Descriptive statistics is an effective tool to understand the distribution of the considered variables. It consists of two types of statistics – measures of variability (variance) and measures of central tendency. Measures of central tendency include mode, median and mean values, measures of variability include standard deviation, variance, interquartile range. Generally, descriptive statistics show how data is located around its central values and how dispersed it is.
For B&B accommodations the descriptive summary is given below:
The average star rating of the B&B accommodations is 3.22222, with average number of beds of 13.33333 and the number of guests is 7.166667 on average. Guests averagely order 3.77778 breakfasts and do not order dinner. The average length of stay is 2.388889 days. Average daily revenue of B&B is 258.22 EURO. A guest is most likely to spend 6.857778 hours in his accommodation.
For Hotel accommodations the descriptive summary is given below:
The average star rating of the Hotel accommodations is 3.0625 (which is lower than for B&B), with average number of beds of 104.6875 (which is significantly higher than in B&B) and the number of guests is 54.53125 (this value is also significantly higher than in B&B) on average. Guests averagely order 31.96875 breakfasts and instead of B&B visitors, they order 30.53125 dinners. The average length of stay is 2.4375 days. Average daily revenue of a typical hotel is 2335.25 EURO. A guest is most likely to spend 7.539063 hours in his accommodation.
Inferential Statistics
It seems that almost all characteristics of hotel accommodation are higher than of B&B accommodation. Hotel guests order more dinners and breakfasts, and spend more time in the accommodations. Hotels have more beds and guests on average and earn more profit than B&Bs.
However, for now it is only our assumption that the real difference between this indicators really exist. To make sure that these expectations are well grounded, it is necessary to provide a statistical testing procedure and verify whether there is a significant difference between considered parameters or not.
For the purpose of this assignment we will pick only several variables to demonstrate the procedure of statistical test. Let’s say we are interested to compare revenues, star ratings and length of stay for guests in B&Bs and hotels. The next step is to choose a correct statistical test for sample means comparison. The one of the most common parametric tests is two sample Student’s t-test. This test, instead of z-test, can be used on a small samples (n<30) and it is not necessary to know the population standard deviation. The only assumption which should be made is that the distribution of the variables is approximately normal. This means that the data is approximately symmetric around the mean value and the graph of probability density function has a bell-shaped form. If this assumption is not met, medians should be calculated and a non-parametric test should be performed.
However, assume that the distribution of Revenue, Star Rating and Length of Stay is normal. For each variable formulate null and alternative hypothesis.
H0: μ1=μ2Ha: μ1≠μ2
Set level of significance at 0.05:
a=0.05
Perform testing in Excel:
The conclusion of statistical test depends on the p-value. If it is higher than significance level alpha (0.05), the null hypothesis is not rejected. If it is lower, null hypothesis is rejected and alternative is approved. In our calculations we failed to reject null hypothesis for Length of Stay and Star Rating. We have no enough evidence to say that the mean values of these variables are significantly different between B&B and hotels (at 5% level of significance). However, the null hypothesis for Revenue value is rejected: there is a significant difference in revenues received by hotels and B&Bs (at 5% level of significance).
Finally, we are interested to develop a linear regression of the total revenue earned in hotels based on other variables given. Consider Revenue variable as a dependent variable and other variables as independent variables. Use Excel to make a multiple linear regression summary:
Assume we take 10% level of significance.
According to ANOVA, the coefficients of the model are jointly significant, thus, the overall model is significant. However, not all coefficients are separately significant. This means that these variables are not significant in prediction Revenue: Number of Beds, Star Rating and Number of guests taking Dinner. They might be excluded from the model and the research could be repeated to find the best form of the relationship. However, for our example we just demonstrate how the regression is look like:
Revenue = -555.232 + 143.974*Star_Rating – 2.09653*Number_of_beds + 51.28331*Number_of_guests – 11.1463*Number_of_guests_taking_Breakfast – 6.16367*Number_of_guests_taking_Dinner +171.0847*Length_of_Stay_for_current_guests
This model (theoretically) can help a tourism manager to predict revenues in hotels based on the given values of the independent variables.