Test Development, Part One
Test Development, Part One
Test Plan for Measuring Knowledge of validity and Reliability
Test format: multiple choice test
Test Length: 20 items (30 Minutes)
Test Universe: This test is designed to measure the knowledge the participant possesses in reliability and validity of testing.
KNOWLEDGE OF TERMS AND CONCEPTS
TYPES OF RELIABILITY
Test-retest method
Internal consistency method
Scorer reliability method
Subtotal: 15%
RELIABILITY COEFFICIENTS
Cohen’s kappa
Correlations in two test administrations
Standard error of measurement
Confidence interval
Subtotal: 25%
EVIDENCE OF VALIDITY
Test content
Response process
Internal structure
Relations to other variables
Consequences of testing
Content
40%
Subtotal: 40%
FACTORS THAT AFFECT RELIABILITY
Test design
Test taker
Test scoring
Test administration
Subtotal: 20%
TOTAL
Multiple-choice test items
TYPES OF RELIABILITY
1. Can the participant successfully use the test retest method to establish reliability?
Yes
No
2. Is the internal consistency of the test sufficient to determine the test reliability?
Yes
No
3. What is the ability of the scorer reliability method accurate in the measurement of test reliability?
RELIABILITY COEFFICIENTS
1. What is the participant’s ability to use the Cohen’s kappa method to determine reliability coefficients?
2. Is there a correlation between the first and second test administration?
Yes
No
3. How close, if any, is the relationship between the first and the second test administration?
Close
No correlation at all
4. What is the standard error of measurement in the test?
Big margin
No standard error
(If there is a standard error, it must be specified)
5. What is the confidence interval of the test coefficients?
Low confidence interval (give specific figures)
Big confidence interval (specify figures)
EVIDENCE OF VALIDITY
1. What is the participant’s knowledge of the test content?
2. How good was the response process undertaken by the participant?
3. What is the participant’s understanding of the internal structure?
4. Can the respondent establish relations between other variables in the test?
Yes
No
5. What are the consequences of testing on reliability?
Good
No consequence
Bad
6. What is the respondent’s knowledge in the test content?
Sufficient
Poor
7. Can the criterion related approach be used in finding the evidence of test validity?
Yes
No
8. Do the constructs of the test give sufficient evidence of validity?
Yes
No
FACTORS THAT AFFECT RELIABILITY
1. Does the test design affect test reliability?
Yes
No
2. Does the test taker determine the outcome of the test reliability?
Yes
No
3. Does the test scoring method affect the test reliability?
Yes
No
4. Does the manner in which a test is administered affect test reliability?
Part Three: Practical and technical evaluation
Include practical and technical evaluation information about the test, using Appendix A of Foundations of Psychological Testing as a guide.
Practical and technical evaluation
Technical evaluation
The method includes 7 subscales. Each subscale represents an action that individuals have with computers. They include trust, hand calculator, general attitude, word processing, data entry, and computer science and business operations. The five major scales used to measure computer anxieties include general attitude, data entry, word processing, business operations and computer science.
Practical evaluation
The user manual has a profile sheet where subscales and total computer anxiety levels of a respondent are plotted. These profiles give interpretive statements alongside other scale values.
Scores in this test range between 40 and 200. High scores indicate a lot of computer anxiety. There are 5 categories of computer anxiety:
40-79 confident and relaxed
80-104 generally comfortable and relaxed
105-129 mild anxiety
130-140 tense and anxious
150-200 extremely anxious
These scores are the same at subscale level where the scale is between 1 and 20.
Similarities and differences
The similarity between the tables and items in the test is that they have the same format of multiple choice questions. However, there are also differences between test items and the tables. The test items provide a bipolar rank of adjectives used to rank the level of knowledge in each subarea. Some tables have numerical responses while others use other forms of qualitative data apart from bipolar responses. Some table tests are done by the respondents themselves such as COMPAS method while in these test items; the practitioner collects the data and analyzes it himself. These differences affect knowledge measurement because if numerical information is used, a definite answer is obtained. However with qualitative data, the data has to be coded before analysis, which affects the accuracy of the test results.
References
Kurpius, R., & Stanford, M. E. (2006). Testing and measurement: A user-friendly guide. New york: Thousand Oaks, CA: Sage.
MCintire, S. A., & Miller, L. A. (2007). Foundations of psychological testing: A practical approach (2nd Edition ed.). London: Thousand Oaks CA: Sage.
Schultz, K. S., & Whitney, D. J. (2005). Measurement theory in action: Case studies and exercises. New York: Thousand Oaks, CA: Sage.