Data Mining refers to an analytic process created to explore a large amount of data in search of steady pattern and the systematic association between variables, and then to authenticate the findings by employing the identified patterns to new subsections of data. Predictive data mining has the highest business applications and is a most common form of data mining. The process of data mining comprises three stages such as initial exploration, model building, and deployment [1].
Figure: Example of data mining process [2]
Importance of Data Mining Process Topic
The process of Data Mining is an exciting process because it provides crucial stages most organizations can use to collect, prepare, to input, process, interpret, and store critical information regarding the company management and performance [3]. The process will be valuable a lot of party and multiparty level in the business as the framework or model that decreases time and cost. The outcome enables responsible knowledge employees to shift into strategic value of data positively by critically analyzing the result.
Stage 1: Exploration
Exploration commences with data preparation which can entail cleaning data, data alterations, choosing subsets of records and, performing certain introductory feature selection to bring the amount of variables to a close range in the scenario of data sets with huge fields/variables. Then, relying on the condition of the analytic issue, this preliminary stage of data mining process can involve wherever between simple selections of reliable predictors for a reversion model to intricate exploratory analyzes through an extensive variety of statistical and graphical methods to detect the most applicable variables [1], [2]. It also helps to determine the general nature or complexity of models that may be considered in the following stage.
Stage 2: Model Building and Validation
This step concerns different models and selecting the most excellent one centered on their predictive performance. This might look like a simple task it at times involves an extremely elaborate process. There are various techniques developed to accomplish that goal, some of which are centered on so known as "competitive models evolutions,” [3] that are, using a variety of models to identical data set and afterward comparing their performance to pick the best. These techniques include boosting, bagging (averaging, voting), meta-learning, and stacking (stacked generalization).
Stage 3: Deployment
It involves utilizing the model chosen as the best in the preceding stage and employing it to new data so as to generate estimates of predictions of the anticipated result. The notion of data mining is turning out to be progressively common as business information management instrument in which it is projected to disclose knowledge structures which may guide decisions in circumstances of partial certainty [4]. Lately, interest in developing novel analytical methods has increased notably to solve the challenges relevant to data mining, such as classification trees. However, data mining continues to be based on the conceptual philosophies of statistics incorporating modeling and the conventional Exploratory Data Analysis (EDA), and it shares with them some components of specific techniques and overall approaches.
Nevertheless, a significant general variation in the purpose and focus of data mining and the EDA is that data mining is more inclined toward applications than the radical nature of the principal phenomena. Also, data mining is comparatively less involved with determining specified connections between the associated fields. For instance, uncovering the nature of the primary roles or the specified nature of collaborative multivariable dependencies between variables is not the principal goal of data mining [5].
References
[1] B. U. Cooper, “Data mining: the process finding useful information from ,” 12-Feb- 2016. [Online]. Available at: https://www.reddit.com/r/datamining. [Accessed: 29-Mar- 2016].
[2] J. A. Kuang, “Processing Data Mining Objects,” Processing Data Mining Objects, 10- Aug-2013. [Online]. Available at: https://msdn.microsoft.com/en- us/library/cc645741.aspx. [Accessed: 29-Mar-2016].
[3] H. R. Nielsen, “Comments (11),” Flux Capacitor RSS, 08-Feb-2014. [Online]. Available at: http://fluxicon.com/blog/2011/02/how-process-mining-compares-to-data-mining/. [Accessed: 29-Mar-2016].
[4] S. R. Rhodes, “Data Mining: What is Data Mining?,” Data Mining: What is Data Mining?, 12-Apr-2015. [Online]. Available at: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining .htm. [Accessed: 29-Mar-2016].
[5] J. F. H. Sitashi, “How to process a data mining model,” How to process a data mining model, 23-Apr-2014. [Online]. Available at: https://technet.microsoft.com/en- us/library/aa216682(v=sql.80).aspx. [Accessed: 29-Mar-2016].