Introduction
The data presented in this project is from the Lendingclub.com (n.d). The data in the website is a wide representation of information concerning loan data collected from members from 2007 to 2012. The period in which this data was collected is synonymous with the financial crisis in the global market. The impact of the financial crisis was far reaching. It affected the lending rates and the approval criteria for loans. Previously, the fiscal policy involved in lending loans by financial institutions cushioned the borrowers. This is because the loans fetched low interest rates and were consequently easy to repay. After the financial crisis, the interest rates increased dramatically making it virtually impossible to service the loans.
Financial markets ailed economically speaking, home values plummeted and the lending requirements became more stringent. Towards the end of the year 2012, the global economy and by extension that of the United States were on the path to recovery. As a result of this, the rates of unemployment subsided, home values started increasing and stock markets were performing better than five years earlier (Sander &. Lambert 98). The interest in this data was to determine whether the approvals for loans varied by state or by credit scores. The statistics project will also look into the data in order to determine whether there is a correlation and pattern to the rejections.
Research Questions
The following research questions guided this statistical project: -
- What is the relationship between residential state and loan rejection?
- What is the relationship between credit score and loan rejection?
- Is there a pattern to the rejection of loans?
Random Sampling
Random sampling is very important in statistics. It ensures that the data that is used during analysis is representative of the population being studied. Findings resulting from a set of data that is randomly sampled can be generalized in other populations (Brechner 123). The data from Lendingclub.com (n.d) is very expansive. For the purposes of this project, a sample size of fifty two was employed during the analysis. In order to arrive at the desired sample size using the random sampling criteria, I employed the use of a random number generator. Using the random number generator from www.random.org, I easily picked a sample of fifty two subjects from the large population with which I was working. I found this random number generator more preferable to those in computer programs because this one is not predictable. The random number generators in computer programs produce random numbers in a predictable manner because they used pre-determined mathematical formulas to arrive at the random numbers. This would not offer the representativeness in the sample required for this project.
Credit Scoring
Credit scoring is very important when assessing someone for a loan. Credit scoring is used to determine how much in loans can one be awarded, at what interest rate and for how long. The data presented in this project has the credit scores for the potential loan applicants computed in terms of debt-to-income ratio. The FICO scores for the loan applicants have also been computed for use in the project. These figures are going to be used in determining the relationship between credit scores and loan approval.
Results and Discussion
Using the Statistical Package for Social Sciences, I analyzed the data on FICO scores for mean, median, mode, standard deviation and variance. The mean of the FICO scores was 597.0
x̄ = 597.9, S.D = 192.6, N = 52.
It is important to assess the data for the minimum and maximum values of the FICO scores in order to determine the range. This is important in order to determine the lowest and highest FICO score value for which loans were denied. The figure below shows the minimum FICO score as zero and the highest FICO score to be eight hundred and twelve. Generic FICO scores are between 300 and 850 (Rich 69). Higher figures imply decreased credit risk and lower figures imply increased credit risk. FICO scores of lower than three hundred shows that one is not credit worth and may therefore be denied a loan. FICO scores tending towards eight hundred and fifty show that someone is credit worth and may be advanced loans if other factors are held constant. Given that the highest FICO score that was denied a loan was eight hundred and twelve, it implies that FICO scores was not the only criteria used to assess an individual’s credit worthiness and the possibility for a loan.
Range =812-0
Range = 812.
This has a very significant effect on the statistics in this project. Firstly, it shows just how variable the answer to any questions can be. Additionally the expansive range has an effect in the confidence level of the analysis carried out. The range shows that the figures in FICO scores are very apart hence are not similar to one another to a large extent.
In order to answer the question as to whether the approvals varied by state, it is important to analyze the variance between the states. The challenge with the data is that the subjects sampled resided in different states. Some States only had one subjected among the sampled population. This made it challenging to calculate the means between the states. However, the distribution of the subjects among different states implies that the state on resided in was not a criterion for assessing an individual for credit worthiness. This is because only a few states had more than one subject sampled. Some of these states include New York, Maryland and Texas among others. Additionally, there was no pattern in the rejections when considering one’s residential state as a variable. This is because there appeared to be no bias towards or against certain states in the rejection of one’s loan application.
In order establish the relationship between the loan rejections and the FICO scores; I compared the means in order to determine the correlation at a confidence level of 95%. Firstly, I had to calculate the means for the FICO scores as distributed in the six years in which the data was collected. The following is the presentation of the means for the FICO scores for the years 2007, 2008, 2009, 2010, 2011 and 2012
2007
Cumulative frequency = 3166
x̄ = 633.2
Standard Deviation = 87.17913Standard Error = 38.98769
2008
Cumulative frequency = 2907
x̄ = 581.4
2009
Cumulative frequency = 5265
x̄ = 585
Standard Deviation = 239.56367Standard Error = 79.85456
2010
Cumulative frequency = 3784
x̄ = 540.5714
Standard Deviation = 242.45265Standard Error = 91.63849
2011
Cumulative frequency = 4487
x̄ = 560.875
Standard Deviation = 256.96995Standard Error = 90.8526
2012
Cumulative frequency = 11482
x̄ = 637.8889
Standard Deviation = 168.31235Standard Error = 39.6716
Then I compared the means to establish any correlation between the means of the FICO scores for the various years in which the data was collected. I used a two tailed t test to compare the means of the FICO scores between two years. The null hypothesis being tested was that there was no significant difference between the means of the FICO scores in the different years in which the data was collected. The level of confidence for this analysis was 95% with a significance level of 0.05. The P value for the years2007 and 2008 was p = 0.39395. Since this value is more than 0.05, the hypothesis that there was no significant difference between the means of the FICO scores in the different years in which the data was collected cannot be rejected.
The same test was subjected to the data from the year 2009 and 2010. F using a two tailed t test, the p value at a confidence level of 95% and significance level of 0.05 was p = 0.71976. The p value was higher than 0.05 implying that the null hypothesis that there was no significant difference between the means of the FICO scores in the different years in which the data was collected cannot be rejected. A further two tailed t test was carried out for the data from the year 2011 and 2012. At a confidence level of 95% and a significance level of 0.25, the p value for the t test was p = 0.36983. The p value for this case was bigger than 0.05, thereby implying that the null hypothesis that there was no significant difference between the means of the FICO scores in the different years in which the data was collected cannot be rejected.
Conclusion
Works cited
Brechner, Robert A. Contemporary Mathematics for Business and Consumers. Mason, Ohio: South-Western Cengage Learning, 2009. Print.
Lending Club Statistics. (n.d.) Lendingclub.com. Retrieved 10 October, 2013 from https://www.lendingclub.com/info/download-data.action
Rich, Jason R. Improve and Increase Your Credit Score. New York: Entrepreneur Press, 2013. Print.
Sander, Peter J, and J J. Lambert. Entrepreneur Magazine's Ultimate Guide to Personal Finance for Entrepreneurs. Irvine, Calif.: Entrepreneur Press, 2007. Print.