Rule discovery
Every database administrator needs to uncover new information concerning the data relations and database relations. Rule discovery therefore is the name given to this process. Rule discovery is defined as the search for patterns, regularities and correlations in the target database by utilizing every available information source (Shneiderman, & Goodman, 2006). By the realization of these relations and regularities, application of optimization queries becomes much easier in the process. A clear and a better outcome will be realized as compared to a situation where the knowledge of the rules in question is not known.
In the past, researchers have researched in this field and have been able to come up with two categories of broad based rules these are the static rules and the dynamic rules.
Static rules are those rules discovered or known not to change in the entire process of database query application. The schema constraints do not change for a long time in fact for the lifetime of the schema (Rothnie, 2004). Static rules are easier to deal with in nature since they need only be complied once. The same rules applied will always be applied.
Dynamic rule on the other hand may change as the database changes. Any updates on the database may render dynamic rules invalid (Shneiderman, & Goodman, 2006). It is therefore a requirement that the recompilation of the applied rules is done once an update has been done on the database. The cost of carrying out this recompilation of rules needs therefore to be put into consideration (Rothnie, 2004).
However, it is to be noted that the practice of revalidating the database or dynamic constraints is a separate task from the issue of query optimization. However, the same will contribute in the efficiency of application of optimized queries that in turn would result in better outcome (Raghu, & Johannes, 2003).
Rule discovery process is data or query driven. Query driven rules are developed or got form the queries themselves. In essence, the restriction clause of queries and the outcome results are used in the determination of rules based on queries (Selinger, et.al. 2009). If two queries syntactically different produce the same answer, a substitute of another can be used for the other appropriately to optimize the process. Queries produced may only optimize those that are similar to previous analyzed queries. This, however, is a challenge as many options of potential semantic optimization are left unexplored at all (Warren, 2005).
Data driven rule discovery employs data distribution behavior in determination of the existing rules. Data patterns are the key to discovering and formulation of semantic rules. Correlation of data also comes in handy in the formulation of the same (Warren, 2005). If, in a table, all values associated with a particular entity or a column is null, incremental method could be used in the column to fill all the values without any effect on the runtime performance of the database. Data driven rules forms a bigger hand in optimization of queries in databases. In data driven rules, data can be categorized according to their relationships or any constraint applicable therein. In another way, partitioning of data could be done in the database. This will make it easier in the application of queries that are optimized.
There exists three level hierarchy defined by rule reliability. These are:
1. Schema constraint. This level is true for all data and it is static in nature meaning that the rule applies for all data at all times.
2. Absolute soft constraint. Just like schema constraint, the rule is true for all data, however the difference come in the sense that this rule is dynamic. Dynamic rule will only apply at a particular time.
3. Statistical soft constraints. This rule is true for most of the data but not all the data. This rule can is also dynamic and can only apply to a particular time only (Cheung, 2007).
Impediments to the implementation of query optimization in databases
Query optimization is very useful in the management of databases. It has been documented that an overall sped up of up to 50% has been achieved in the past by the application of optimized queries (Selinger, et.al. 2009). Semantic optimization has been the type of optimization that has been widely poised as the best form of optimization. However, all forms of optimization are advantageous. There have been impediments, however, to the realization of effective implementation of query optimization in databases. Many authors have come up with the following reasons as to why query optimization has not quite well been utilized in the commercial world.
i. Most optimization of queries is designed to be used in deductive databases leading to relatively high cost of implementation.
ii. Most CPU speeds are not high enough to accept the computational complexity of query optimization.
iii. In the past, the use of schema constraint to capture business rules had not yet been realized, so by then, opportunities for optimization seem limited. However, in the recent times, the application of schema constraints to capture rules has been realized hence the recent advancement in application of optimize queries.
Rule discovery can also be seen as a contributing impediment to successful implementation of query optimization. Without rule discovery phase, only rules that are known a priori can be implemented. Unknown rules cannot be employed since no discovery has been done.
However, if a rule is discovered, the relevance of the same is also a consideration that needs to be thought of. Some rules discovered may reflect true nature of the correlations between data and therefore valid and relevant. However, the same cannot always be the case (Cheung, 2007). The rule may address part of a database domain that can be of no interest to the user at all. This can be so if the domain addressed cannot appear in the query applied to the database.
Advanced types of optimization
In the past few years, researchers have been in the forefront proposing and researching on a number of advanced types of query optimization. Some of these advanced methods of query optimization in question include semantic query optimization, global query optimization and parametric or dynamic query optimization.
Semantic query optimization is just an optimization that is mostly concerned with the rewriter module of the optimizer. Just as the name describes, semantics are used for to rewrite given queries in to equivalent alternatives (Severance, & Carlis, 2007). That is semantically equivalent. The planner, just like the normal regular queries, does optimization, has done once this.
Global query optimization involves the optimization of many queries at the same time. The generic query optimization focuses on optimization of a single query. This is not the case in global query optimization. This type of optimization is mostly used in queries with joints, those that are originated from a single program, those from multiple and concurrent users or even queries in a deductive system (Rothnie, 2004). The main aim of global query optimization is to achieve total optimal execution of the entire queries. For individual query, the result might not be the best optimal solution but still the overall result for all the queries will be optimal. Planner module will determine the best plan for implementation of global query optimization.
Parametric or dynamic query optimization solves the problems that arise because of change in parameters in optimization and execution. Typically, embedded queries are optimized once at compilation time and at run time, many executions are done. This may affect the temporal relationship existing between parameters during and after optimization. Parametric optimization, therefore, involves the development of a plan that will see to it that the best choice is made. The best choice is one that will see to it that optimization is exhaustively done at compile time and all possible values of parameters at run time are identified (Severance, & Carlis, 2007).
Reference
Chen, P. s., & Akoka, J. (2006). Optimal design of distributed information systems. IEEE Transactions on Computers. 29 (12), 1068-1080.
Chesnais, A., Gelenbe, E., and Mitrani, I. (2006). On the modeling of parallel access to shared data. Communication ACM. 26, (3), 196-202.
Cheung, T.-Y. (2007). A method for equijoin queries in distributed relational databases. IEEE Transactions on Computers. 31 (8), 746-751.
Chiu, D. M., and Ho, Y. C. (2004). A methodology for interpreting tree queries into optimal semijoin expressions. In Proceedings of the ACM SIGMOD International Conference on Management of Data ACM, New York, 1(2), 169-178.
Davis, L. S., & Kunii, T. L. (2005). Pattern databases. In Data Base Design Techniques II. 12 (4), 357-399.
Dayal, U. (2003). Evaluating queries with quantities: A horticultural approach. In
Proceedings of the ACM Symposium on Principles of Database Systems {Atlanta, Ga.).
ACM, New York, 3 (9), 125-136.
Epstein, A., Stonebraker, M., and Wong, E. (2008). Distributed query processing in a relational data base system. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. ACM, New York, 3 (9), 169-180.
Garcia-Molina, H., Ullhman, J., and, Widom, J., (2004) Database Systems-The Complete
Book, London: Pearson Education.
Raghu R., and Johannes G., (2003) Database Management Systems, New York.NY:McGraw Hill.
Rothnie, J. B. (2004). An approach to implementing a relational data management system. In Proceedings of the ACM-SIGMOD Workshop on Data Description, Access, and Control (Ann Arbor, Mich., May 1-3). ACM. 4 (2), 277-294.
Selinger, et.al. (2009). Access path selection in a relational database management system. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Boston, Mass., May 30-June 1). ACM. 9 (6), 23-34.
Severance, D. G., and Carlis, J. V. (2007). A practical approach to selecting record access paths. ACM Comput. Surv. 3 (4), 259-272.
Shneiderman, B., and Goodman, V. (2006) Batched searching of sequential and tree structured files. ACM Trans. Database Syst. I. 7 (3), 268-275.
Walker, A. (2007). On retrieval from a small version of a large data base. In Proceedings of the 6th International Conference on Very Large Data Bases (Montreal, Oct. 1-3). IEEE: New York. 3 (9), 47-54.
Warren, D. H. D. (2005). Efficient processing of interactive relational database queries expressed
in logic. In Proceedings of the 9th International Conference on Very Large Data
Bases
IEEE, New York. 6 (1), 272-281.