Evaluation is the systematic process of determining the worth of a program. It utilizes scientific methods to assess program design, implementation, and final outcomes. Evaluation is conducted for various reasons. Funders dedicate huge financial resources to various programs and as such, they want to know how effective the programs are in solving various problems. Evaluation is also by the need to improve the implementation of the program. This is done by identifying the flaws in program implementation and correcting them. At the end of the program, the stakeholders do also need to know what the program accomplished so as to justify the need for increased funding. Moreover, leaders and managers of organizations want to make evidence-based managerial and policy decisions that are informed by data obtained by evaluating past program performance.
The evaluation process is categorized in various ways. Depending on the timing of program evaluation, evaluation can be categorized as either one-shot or on-going. The one-shot study examines a single aspect related to the implementation of the program or a specific outcome. On the other hand, on-going assessment measures inputs, outputs, and outcomes as the program progresses. Another way of categorizing program evaluation is as either formative or summative. Formative evaluation is meant to improve the way the program is conducted while summative evaluation measures the program outcomes or impacts and is carried out at the end of the program. According to research method used to evaluate the program, evaluation can be termed as either qualitative or quantitative. The qualitative approach utilizes an inductive and open-ended approach in research so as to answer questions related to the program while quantitative approach uses structured surveys and its data usually in numbers.
Program evaluation can also be classified as process or outcome evaluation. In process evaluation, the evaluator determines if program activities have been implemented as required and produced certain outputs. Process evaluation can be conducted periodically throughout the lifetime of the program and starts with the review of activities and output components of a logic model. On the other hand, outcome evaluation assesses the effects of the program in the population of interest by looking at the progress in the outcomes that the program is set to address.
Measurement instruments in program evaluation need to be reliable and valid. Reliability refers to the extent to which a measurement instrument give similar results on repeated measurement of the same condition. Reliability of a measure is thus its consistency. Reliability is strengthened by pretesting data collection instruments and procedures as well as planning for quality control procedures in the field and when processing the final data. On the other hand, validity is the extent to which a measure measures what it purports to measure. That is, a valid instrument should accurately measure what the evaluator needs to evaluate.in program evaluation, there two important types of validity: internal validity and statistical conclusion validity. Internal validity determines whether a program has delivered an outcome and also assesses the magnitude of that effect while external validity has to do with generalizability of evaluation findings. That is, when findings can be applied beyond the context or sample being studied, it is said to possess generalizability.
RELIABILITY AND VALIDITY IN MEASUREMENTS AND EVALUATION
Reliability and validity of instruments can be enhanced in various ways. To enhance validity, the evaluator should do a literature search and use already developed measurement instruments. If no instruments are available, the evaluator should use content experts to develop his or her own instrument and pilot the instrument before using it in a study. On the other hand, reliability can be enhanced by increasing the number of items measuring the same concept in an instrument, if the reliability is low. If some items in a measure are found to lack discriminating power, they should be removed. Finally, reliability can be enhanced by training of observers in observational studies.
METHODS OF MEASUREMENTS IN EVALUATION
Programme evaluation employ a wide range of methods to help in gathering the data needed to understand how the program works. These methods include interviews, questionnaires, focus groups, using agency records, and use of trained observer ratings.
Agency records are data entered into an organization’s records system by a representative of that organization. Agency records data contains client characteristics, the kind of services used and the cost of each, the amount of work done, response times, recidivism, and disposition of work.
Surveys are divided into two categories: interviews and questionnaires. Questionnaires are majorly paper-and-pencil instruments that are completed by the respondents. Questionnaires are of two types: open-ended and closed-end. In open-end type, the respondent supplies their own answers to every item in a survey while in close-ended ones, the respondents are given response options to choose from. Unlike questionnaires, interviews are more personal. It involves the interviewer working directly with the respondent. The interviewer is able to answer follow-up questions. They are generally easier for the respondents if the interview sought opinions.
In focus groups, a trained leader called a facilitator leads a small group in a discussion on a particular topic. This method is used to get opinions on a specific topic of interest. The composition of the group, as well as the group discussion, are carefully planned so as to create an environment in which group members can freely talk. In instances where program evaluation assesses behaviours that can be categorised, counted, or rated by means of eyes or other senses, information can be collected by use of trained observer ratings.
STEPS AND STAGES OF TEST DEVELOPMENT
Test development process involves a number of steps. These steps are:
Definition of purpose
Definition of content domain
Creating test blueprint
Writing and reviewing test items
The pre-test
Detecting and Removing Unfair Questions
Assembling the Test
Analyzing Beta Exam Results
Constructing Equivalent Exam Forms
Establishing the Passing Score
Administering/Scoring Operational Exams
Providing Ongoing Test Maintenance
Definition of Purpose
In defining the purpose of the test, the test developer must state the intended interpretation of the test as well as the use of the test results. Also, the intended audience of the test needs to be well defined. The test developer should make a list of characteristics of test takers. These characteristics are those that directly affect test takers will respond to the test items. They include their reading level, language, and disabilities. The developer is also expected to demonstrate the measurement quality versus the measurement cost.
Definition of Content Domain
The test developer should fully identify the content areas that need to be tested. The content domain should be sample in such a way that the final items represent the target of measurement. This is meant to ensure content validity. A test is said to have content validity if it assesses knowledge of the content domain that it was designed to measure. Also, the cognitive demand that the test addresses should be specified.
Creating Test Blueprint
A test blueprint is a document that shows the content of the test that will be given to the students. It has the instructional objectives, the questions designed to match these objectives, and also the learning domains at which the test developer ask the test takers. In essence, it is a plan created and used when developing a test.
Creating and Reviewing Test Items
When writing test items, the test developer should clearly identify the objective that the test item intends to measure. Also, the type of performance that is needed to demonstrate the possession of the skill should be identified. The test developer should also specify the environment where the test will be taken and create a scoring rubric for the test. After the tests are written, they undergo rigorous review process done by a specialist in the subject matter. Review of tests and revisions ensures that the test is clear and without any ambiguity and that it only have one correct answer in the case of multiple choice questions.
The Pre-test
After writing and reviewing the test, the test is pretested with a sample group that has the same characteristics of the population that will be tested. The results of the process enable the test developers to determine the difficulty of each test item, the ambiguity, and the items that need to be revised or removed.
Detecting and Removing Unfair Questions
This involves trained reviewers carefully inspecting each test item, the whole tests, and any descriptive material so as to ensure that the language, words, symbols, phrases, and content that are deemed offensive to any subgroup of test-takers are removed. Also, questions that may be deemed biased are removed.
Assembling the Test
Once the review process is done, the test is placed in an item pool. Item pool refers to a depository of all the items that are necessary for the examination.
Analysing Exam Results
The item data undergo a review to determine if they performed as intended. The item statistics that of interest in this stage are item difficulty and item discrimination. These should be viewed from a psychometric perspective. At this stage, any flawed item should be taken bak for additional review. For successful item statistics analysis, the contet experts should be given the number of test-takers choosing each option in a test and the mean score obtained by each group.
Constructing Equivalent Exam Forms
Test developers come up with multiple forms of exam for various reasons (e.g. security). When multiple exams are constructed, they should be equivalent form content, and the statistical perspective ad should also be reliable. Equivalence is established by ensuring that the test aligns with the test blueprint.
Establishing the Passing Score
After exam construction, the passing score should be determined. Pass or a fail standards must be developed in line with testing guidelines should be fair and reasonable. The passing score is set as normative or absolute standards. Normative standards compare candidates score to those of other test-takers so as to establish a pass or a fail. On the other hand, absolute standards establish a particular level of performance which must be obtained. Pass or fail decisions are made by making reference to this level.
Administering and Scoring Exams
After establishing the pass mark, the exam is ready to be administered. Administration can be via paper-and-pencil especially group setting. Administration of exams can also be done by use of computers. The testing environment should be comfortable and free form distractors. The administartors of the test should alos adhere to standardized procedures. After administering the test, it is then scored by certified markers or by use of computers.
Providing Ongoing Test Maintenance
During test development cycle, data on group mean, standard deviation, standard error of measurement, highest and lowest scores, the percentage of candidates passing, and exam reliability should be obtained and recorded. These are needed by the certifying organization for monitoring of the consistency of the test.
NEEDS ASSESSMENT
In program evaluation, a need can be defined as a gap existing between the actual state of affairs and what the program intends to achieve. Needs assessment is a process of identifying the social problems, determining their magnitude, and defining the target population that the program intends to serve and the nature of their needs. Needs assessment enables the evaluator to determine if there is a need for a program. If the need does exist, it also enables the evaluator to determine the most appropriate program services to address the need.
ROLE OF SAMPLING IN RESEARCH AND EVALUATION STUDIES
The sampling process involves drawing a subset of from a population of interest. There are two sampling techniques that an evaluator can choose form: probability and non-probability sampling. Ideally, the sample should be picked randomly (by probability sampling). This means that program participants should be given equal chances of being in the final sample. When paricipants are selected by non-random (non-probability sampling) means, the sample becomes non-representative of the population and evaluation findings cannot be generalized. Choosing an adequate sample for statistical methods is also important as it has an impact on the quality of evaluation findings. When the evaluator gives little attention to sampling, it can lead to bias.
ELEMENTS OF CONTENT AND STYLE THAT ARE PART OF AN EVALUATION REPORT
Prior to report writing, the evaluation team must establish clear findings, conclusions, and recommendations that address the evaluation questions. They should also decide how best to organize the report in such a way that it conveys these elements effectively. The three main elements of an evaluation report are findings, conclusions, and recommendations.
After the research stage of evaluation is completed, the evaluation team has gathered enough data to answer evaluation questions. Regardless of kind of data collected and the methods of collection, the team first tasked is to convert the raw data into findings. Findings are facts gathered in the during the evaluation process.
One the findings have been laid out, the evaluation team draws a conclusion for each evaluation question. Conclusions are the team’s judgements and are based on the findings. There must exist a clear and logical relationship between the two elements.
Once the findings are presented, the evaluation team make recommendations. Recommendations refer to proposed actions for management. Recommendations can be derived from conclusions and findings, but the evaluation team can also derive them from experience as long as they are useful. When making recommendations, practicability is paramount.