Research in Brief - April 2008 - Volume 108 (4)
Web-based Formative Assessment as Evidence Based Practice in Science Instruction
Tufan Adiguzel & Kimberly J. Vannest
Content area assessments of the high stakes variety are a driving force in how students are instructed in classrooms yet proficiency in science continues to be below expectation when only 29 percent of 4th- and 8th-graders reached proficiency, and 12th grade proficiency dropped to 18% on the 2005 National Assessment of Educational Progress (NAEP) (The Nation's Report Card: Science 2005). This may be as a result of the relative mismatch between end of year testing and instruction or the delay in data that could alter instructional decision making. An improved formative assessment of learning is part of the solution to this mismatch and delay in science.
Formative assessment has a long history as an evidence based practice (Black & Wiliam, 1998; Crooks, 1988; Fuchs & Fuchs, 1986; Natriello, 1987) and has demonstrated effectiveness for increasing performance across all academic areas (Sadler, 1998). Formative assessment allows teachers a valid assessment in a timely approach so as to adjust instructional methods. Formative assessment pinpoints the content to be taught or re-taught (Frisbie, Miranda, & Baker, 1993) and is easily adoptable when brief, formative measures can be integrated within a lesson.
Characteristics of formative assessment include instructional validity i.e., items referenced directly to recent lesson content, sensitivity (Hambleton, Swaminathan, Algina, & Coulson, 1978) and relative brevity. Two typical types of formative assessment include criterion-referenced testing (CRT) (Berk, 1984; Ebel & Frisbie, 1986; Linn & Gronlund, 1995; Popham, 1978) and progress monitoring toward long term goals (LTG) (Deno, 1985). Formative assessment suits well the regulatory requirements of alignment between test content and content (NCLB, US Dept. of Ed., 2004).
This research study presents a web-based vocabulary-based assessment of science learning with strengths of both CRT and LTG monitoring through two-year development to meet the following criteria: (a) domain representation-adequate representation of the science content domain covered by instruction; (b) efficiency-probes should be quick to administer; (c) repeatable construction-parallel forms should be producible by an objective algorithm; (d) standardized administration and reliable scoring, (e) instructional sensitivity (validity)-the ability to measure performance change from instruction; (f) sensitivity to growth over time in progress monitoring. Specifically, both years of the present study focus on instructional sensitivity (validity) to accomplish performance change from instruction through web-based environment, Science Key Vocabulary Assessment (SKeVA©) (Vannest, Adiguzel, & Parker, 2007).
SKeVA© is a web-based science assessment system which provides increased exposure to science vocabulary and a better understanding of the TAKS science material, creates more opportunities for learners to experience science vocabulary and allows for data based decision making at the district, school, and classroom levels. SKeVA© utilizes the advantages of the formative assessment: (a) cumulative skill assessment; (b) reliable measure of growth or improvement(c) brief probes for frequent use and (d) performance feedback for students and teachers.
SKeVA© contains over 1,500 (=1,813) Key Vocabulary (KV) items based on over 500 (=554) KV words, aligned with the TEKS or TAKS Objectives, matched to several instructional materials, and edited for content and wording by experienced science teachers and coordinators. The SKeVA© system has two different test functions: Pre-Post and Progress Monitoring testing. The Pre-Post System gives a picture of student's knowledge prior to instruction and what the students have learned after the instruction of the unit is given. The Progress Monitoring System has repeated measurements (tests) that students can take weekly or bi-weekly and produces “snapshots” that provide valuable information about how the students are progressing and shows if students need individualized attention.
Teachers can create both test cycles easily and quickly in the system. On the test creation page teachers choose the type of tests depending on how long the unit is, select the objective(s) or TEKS, and then keywords that they have selected. When tests are created for the students, SKeVA© randomly queries the items from the item bank based on keyword selection. All students take the same 20-item test in four different chunks. Five items with six answer choices are presented within each chunk screen where one answer is left over as a distracter. Students answer the items by filling in blank fields or dragging the keyword onto the item area, and get immediate feedbacks including a progress graph at the end. Topics are identified for re-teaching and the system generates a number of charts and graphs at the individual, class, teacher, school, and district levels for TEKS, objectives, percentile grouping, and keywords.
The development process of SKeVA© was through an active server page, Personal Home Page (PHP), data submissions, and the data were inserted to a formatted database, MySQL. JavaScript was used on the client-side to validate user's entries before submission. The content was presented by following Dillman's (2000), and Lynch and Horton's (2002) respondent-friendly design principles to give an equal chance to all users based on their computer literacy and to be used effectively with the wide variety of computers and browsers used by teachers and students in the classrooms.
Students (N = 482) participated in either pre-post testing (NFirst = 290, NSecond = 128) or progress monitoring (NSecond = 64) during both years. In the first year of the study, students were pre-tested and post-tested again after a unit of instruction, using both equivalent KV probes and multiple choice items built in SKeVA©, in the format of the Texas' high-stakes end-of-year science test. During the second development year, pre/post lesson administrations of probes were conducted again, but under different rules for probe creation, and with a different KV probe format. Finally, to address sensitivity for progress monitoring, equivalent probes were administered weekly over a five-week period, permitting time-series data analysis.
By fully computerizing the item selection, test production, test administration, scoring and results summary and display, a large effect size was found for pre-post testing gains and a moderate effect size was demonstrated for progress monitoring. In the first development year, 77% of students showed measureable improvement from pre to posttest whereas this number increased to 85% for the second year sample. Across all classes in the second year, students improved an average of 21 “percent correct” points from pre- to post-test. There was a 17 point gain from first to last probe (80% to 97%) in progress monitoring test results over the five weeks, which corresponds loosely to the pre/post comparison. This indication of greater gains on KV probes indicates greater sensitivity to instruction.
Having fewer deterioration scores on KV probes resulting in greater learning on KV than TAKS probes was a major year one result. The results from year two showed that smaller changes such as increasing the number of items, and decreasing the item chunks confirmed the usefulness of the KV probe format in pre/post-assessment. Overall, the SKeVA© system with its supported technology was successfully adopted by the teachers and students involved in the study, and as reputation spread its use expanded. In the third development year of the research study, another criterion (g) validation against the end-of-year Texas science test will be accomplished by having some classrooms progress monitor over most of the year, and then matching student improvement on the KV probes with attainment on the end of year, high stakes Texas science TAKS test.
References
Berk, R. A. (Ed.). (1984). A guide to criterion referenced test construction. Baltimore: Johns Hopkins University Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7-74.
Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58, 438-481.
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219-232.
Dillman, D. (2000). Mail and internet surveys: The tailored design method (2nd ed.). NewYork: John Wiley.
Ebel, R. L., & Frisbie, D. A. (1986). Essentials of educational measurement (4th ed.). Sydney: Prentice-Hall of Australia.
Frisbie, D. A., Miranda, D. U., & Baker, K. K. (1993). An evaluation of elementary textbook tests as classroom assessment tools. Applied Measurement in Education, 6(1), 21-36.
Fuchs, L. S., & Fuchs, D. (1986) Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53, 199-208.
Grigg, W. S., Lauko, M. A., & Brockway, D. M. (2006). The nation's report card: Science 2005 (NCES 2006-466). Washington, DC: U.S. Department of Education, National Center for Education Statistics. Retrieved March 1, 2008, from http://nces.ed.gov/NAEP/pdf/main2005/2006466.pdf
Hambleton, R. K., Swaminathan, H., Algina, J., & Coulson, D. B. (1978). Criterion-referenced testing and measurement: A review of technical issues and developments. Review of Educational Research, 48, l-47.
Linn, R. L., & Gronlund, N. E. (1995). Measurement and assessment in teaching (7th ed.). Englewood Cliffs, NJ: Prentice-Hall.
Lynch, P., & Horton, S. (2002). Web style guide: Basic design principles for creating web sites (2nd ed.). New Haven, CT: Yale University Press.
Natriello, G. (1987). The impact of evaluation processes on students. Educational Psychologist, 22, 155-175.
No Child Left Behind (NCLB) Act (2004). Retrieved July 22, 2007 from http://www.fairtest.org/joint %20statement%20civil%20rights%20grps%2010-21-04.html
Popham, W. J. (1978). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education, 5(1), 77-84.
Vannest, K. J., Adiguzel, T., & Parker, R. E. (2007). Science Key Vocabulary Assessment. (SKeVA) (Beta Version) [Web-based application]. College Station, TX: Texas A&M University. Retrieved February 26, 2007. Available from http://skeva.tamu.edu/
|