We need to align the: • Objectives • Instruction • Assessment We need to align the: • Objectives • Instruction • Assessment We need to align the:
• Objectives • Instruction • Assessment
Traditional reasons that teachers assess students:
• To diagnose students strengths and weaknesses • To monitor students progress • To assign grades to students • To determine instructional effectiveness
Today's reasons for teachers to know about assessment:
• Test results determine public perceptions of educational effectiveness • Students assessment performances are increasingly seen as part of the teacher evaluation process • As clarifiers of instructional intentions, assessment devices can improve instructional quality
Education assessment:
is a formal attempt to determine students' status with respect to educational variables of interest
Formative assessment:
classroom based assessment used by teachers to adjust their ongoing instructional procedures (formal and informal)
Summative assessment:
tests used to make final adjustments about students or the quality of a teacher's instruction
High-stakes test
an assessment for which important consequences ride on the tests results
Teachers need to look at the _______ of student performance
growth
Achievement tests
a measurement of the knowledge and/or skills a student possesses. Although often thought to assess what a student has achieved in school, some commercially published achievement tests contain many items measuring a student's out of school learning or the student's inherited academic aptitudes
Aptitude tests
a measurement device intended to predict a student's likelihood of success in some future setting, often an academic one
Stability reliability:
consistency of assessment results over time
Reliability equals
Consistency
Alternative form reliability:
consistency of results of two versions or forms of the same test
Internal consistency reliability
consistency of items on a test
Validity
the degree to which evidence supports the accuracy of the test-based interpretations about students
Validity argument
test makers gather evidence to show that their tests are valid (the tests permit the inferences being claimed)
Content related evidence of validity
represents the adequacy with which the content of a test represents the content of the curricular aim (does the assessment test what we want the students to know?)
Criterion related evidence of validity
only when using an assessment to predict how well a student will perform 1) Aptitude test (SAT or ACT) 2) Subsequent grades earned by a student 3) Assumption that those scoring high will do good in college
Construct related evidence of validity
supports the existence of a hypothetical construct and indicates an assessment device does, in fact, measure that construct
Assessment bias
the qualities of an assessment instrument that offend or unfairly penalize a group of students based on gender, race, ethnicity, SES, religion, etc.
Empirical Approach
• High stakes test • Large number of students • Review of items for potential bias • Look at subgroups or specific items • Differential item functioning (DIF) • Even after an item has been identified as a differently function item, it doesn't mean the item is biased
Five General Item Writing Commandments
• Be clear in your directions • Be specific • Don't give away clues • Be consistent with answer types • Try to keep it as a level that the students will understand
Binary Choice Items
• Questions shouldn't be obvious • Include only one concept in each question • Equal true/false answers • Need to be the same length
Multiple Binary Choice Items
• Separate item clusters vividly from one another • Make certain that each item meshes well with the clusters stimulus material
Multiple Choice
• no all the above • stem contains question/problem • randomly assign ABCD • avoid negatively stated stems • do use "none of the above" to increase difficulty
Matching Items
• premises > left column • responses > right column • you should have more responses than premises • your list should be consistant • have fairly brief lists • order responses logically (alphabetical) • describe the basis for matching and number of times responses may be used
Outlier
number not very close to the mean
Frequency distribution
scores from smallest to largest
Cognitive:
targets are those that deal with a student's intellectual operations -for instance, when the student displays acquired knowledge or demonstrates a thinking skills such as a decision making or problem solving.
Affective
targets are those that deal with a student's attitudes, interests, and values, such as the student's self-esteem, risk taking tendencies, or attitudes toward learning.
Psychomotor
targets are those dealing with a student's large muscle or small muscle skills.
Small-scope curricular aim
are more specific (small grain size); if the curricular aim can be achieved by students in a short while, perhaps in a few days or even in one class session.
Broad-scope curricular aim
are more general (large grain size); if the curricular aim seems likely to take days, weeks, or months to accomplish.
Reliability
the consistency of results produced by measurement devices.
Item-Writing Guidelines for Short-Answer Items
• Usually employ direct questions rather than incomplete statements, particularly for young students • Structure the item so that a response should be concise • Place blanks in the margin for direct questions or near the end of incomplete statements • For complete statements, use only one or, at most, two blanks • Make sure blanks for all items are equal in length
Item-Writing Guidelines for Essay Items
• Convey to students a clear idea regarding the extensiveness of the response desired • Construct items so the student's task is explicitly described • Provide students with the approximate time to be expended on each item as well as each item's value • Do not employ optional items • Precursively judge an item's quality by composing, or mentally writing a possible response
Guidelines for Scoring Responses to Essay Items
• Score responses holistically and or analytically • Prepare a tentative scoring key in advance of judging student's responses • Make decisions regarding the importance of the mechanics writing prior to the scoring • Score all responses to one item before scoring responses to the next item • Insofar as possible, evaluate responses anonymously
Question: if a test is valid, it is almost always reliable. Why is this true?
A reliable test means that you get consistent scores. A valid test means that it is a good solid test and measures what it is intended to measure as well. Therefore, if a test is valid, it is always reliable because it's accurate and it's consistent.
T/F: A classroom test that validity measures appropriate knowledge and/or skills is likely to be reliable because of the strong link between validity and reliability. What is the short-coming
DOUBLE CONCEPT
T/F: Test items should never be constructed that fail to display a decisive absence of elements which would have a negative impact on students because of their gender or ethnicity. What is the short-coming?
NEGATIVES
What is the most synonymous label for "student academic achievement standards?
Performance Standards.
Which of the following is typically recommended for use with students who have the most serious cognitive disabilities?
Alternate Assessments.
As most educators currently use the expression "content standard" to which of the following is that phrase most equivalent?
A Curricular Aim
Performance assessment
an approach to measuring a student's status based on a way a student has completed a specific task. (Examples: write a letter to an editor, interviews, class news letter, persuasive letters, planning a field trip to the moon)
Features of Performance Assessments
• Multiple evaluation criteria: more than one evaluation criteria • Pre-specific quality standards: • Judgmental appraisal: human judgment
Three Types of Rubrics
• Task Specific Rubric • Hypergeneral Rubrics • Skill Focused Rubrics
Task Specific Rubric
o Scoring guides whose evaluative criteria deals only with scoring a student's responses to a particular task, not the full range of tasks that might represent the skill being measured
Hypergeneral Rubrics
Scoring guides whose evaluative criteria are described in excessively general terms
Skill Focused Rubrics
o Scoring guides whose evaluative criteria are applicable for judging a student's responses to any suitable skill measuring tasks
Five Rules for Skill-Focused Rubrics
• Make sure skill to be assessed is significant • Make certain all of the rubric's evaluative criteria can be addressed instructionally • Employ as few evaluative criteria as possible • Provide a succinct label for each evaluative criterion • Match the length of the rubric to your own tolerance for detail
Working Portfolios
o Ongoing collections of a students work samples focused mostly on the improvement, over time, in a student's self-evaluated skills
Showcase Portfolio
Collections of students best work
Three functions of portfolios
• Documentary of students progress • Showcasing student accomplishments • Evaluation of student status
Seven Guidelines
• Make sure your students "own" their portfolios • Decide on what kinds of work samples to collect • Collect and store work samples • Select criteria by which to evaluate portfolio work samples • Require students to evaluate continually their own portfolio products • Schedule and conduct portfolio conferences • Involve parents in portfolio assessment process
Formative assessment
is a planned process in which assessment-elicited evidence of student's status is used by teachers to adjust their ongoing instructional procedures or by students to adjust their current learning tactics.
A Process - - Not a Test
• Planned process • Instructional strategy • Formative assessment is a process requiring some serious up-front planning if it is going to work
Why is formative assessment a research based instructional strategy, not more widely used in our schools?
• Misunderstandings regarding formative assessment • Teachers' tendencies to resist altering their current conduct • The failure of many external accountability tests to accurately mirror the improvements occurring in the classrooms of teachers who use formative assessment
Should assessment FOR learning replace assessment OF learning?
• They are both very important and should be used on different occasions • FOR Learning: formative type of assessment, which occurs during the learning process; common learning that takes place during class; students participating in group discussions • OF Learning: summative type of assessment, which sees what students have learned at the end; final exam; standardized testing
OF Learning
summative type of assessment, which sees what students have learned at the end; final exam; standardized testing
FOR Learning
formative type of assessment, which occurs during the learning process; common learning that takes place during class; students participating in group discussions
If an accountability test produces a statistically significant disparate impact between minority and majority students' performances, it is certain to possess assessment bias.
False
Differential item functioning (DIF) represents today's most common approach to the empirical detection of potentially biased test items.
True
For a test item to be biased, it must offend at least one group of students on the basis of that group-members' personal characteristics such as race, religion, or gender.
For a test item to be biased, it must offend at least one group of students on the basis of that group-members' personal characteristics such as race, religion, or gender.
Even if the individual items in a test are judged to be bias-free, it is possible for the total set of items, in aggregate, to be biased.
True
Typically, judgment-only approaches to the detection of item bias are employed prior to use of empirical bias-detection techniques.
True
Assessment accommodations require the creation of a substantially new test, hopefully equated to the original test.
False
"English Language Learners" (ELLs) are those students who have been identified as capable of responding appropriately to an English-language assessment.
False
The President's Advisory Commission on Educational Excellence for Hispanic Americans issued a May, 2000 report asserting that state accountability officials "allow Hispanic youngsters to become invisible inside the very system charged with educating them."
True
If a teacher's classroom test in mathematics deals with content more likely to be familiar to girls than boys, it is likely the test may be biased.
True
Empirical bias-detection techniques, especially DIF-biased approaches, will invariably identify more biased items than will even a well-trained and representative bias-review committee.
False
Which of the following is not a recommended item-writing rule for the creation of binary-choice items?
Rarely use statements containing double negatives, although single negatives are acceptable
Which of the following conclusions regarding multiple binary-choice items has not been supported by available research?
These items are a bit less difficult for students than multiple-choice items
Of the following four statements, one is not a guideline to be followed when constructing multiple-choice items. Which statement is it?
To keep stems brief, place most words in an item's alternatives
Which of the following is a generally recommended item-writing rule for matching items?
In the test's directions, describe the basis for matching and the number of times a response can be used
Presented below are four item-writing rules. Which one is a guideline often recommended for the construction of short-answer items?
Typically employ direct questions rather than incomplete statements, especially for young students
Select the one accurate guideline below for teachers who are scoring students' responses to essay items.
Prepare at least a tentative scoring key in advance of judging students' responses to any item
One of the following rules for the construction of essay items is accurate. The other three rules are not. Which is the correct rule?
Construct all essay items so the student's task for each item is unambiguously described
Which of the following rules is often recommended for the generation of matching items?
Employ relatively brief lists, placing the shorter words or phrases at the right
One of the important rules to be followed in creating multiple binary-choice items is that:
Item clusters should be strikingly separated from one another
Because students' parents can ordinarily become heavily involved in portfolio assessment, a teacher's first task is to make sure parents "own" their child's portfolio.
False
Fortunately, well organized teachers do not need to devote much time to the conduct of portfolio conferences.
False
When held, portfolio conferences should not only deal with the evaluation of a student's work products, but should also improve the student's selfevaluation abilities.
True
In general, a wide variety of work products should be included in a portfolio rather than a limited range of work products.
True
In order for students to evaluate their own efforts, the evaluative criteria to be used in judging a portfolio's work products must be identified, then made known to students.
True
Students should rarely be involved in the determination of the evaluative criteria by which a portfolio's products will be appraised.
False
Early in a school year, a teacher who is using a portfolio assessment should make sure the students' parents understand the portfolio process.
True
Parents should become actively involved in reviewing the work products in a child's portfolio.
True
Students should be asked to review their own work products only near the end of the school year so their self-evaluations can be more accurate.
False
Students' work products must be stored in file folders, then placed in a lockable metal file cabinet to prevent unauthorized use of a student's portfolio.
False
Formative assessment is best thought of as:
A process in which assessment-elicited evidence informs adjustment decisions
Which of the following would NOT be a clear instance of summative assessment?
When students employ their performances on classroom tests to decide whether to adjust how they are trying to achieve curricular goals
Formative assessment is most similar to which of the following?
Assessment FOR learning
Which one of the following types of tests is most frequently, but mistakenly, pushed by its developers as formative assessments?
Interim assessments
The most compelling empirical support for formative assessment is that supplied by:
A 1998 research review by Paul Black and Dylan Wiliam of classroom-assessment studies
The chief ingredients of a learning progression are its:
Building blocks
Of the following options, which one is—by-far—the most integral to the implementation of formative assessment in a classroom?
Use of assessment-elicited evidence to make adjustments
Which of the following questions is NOT an element in a research-supported conception of formative assessment?
Formative assessment should be used only by teachers to adjust their ongoing instructional activities
Which of the following is generally conceded to be a key component of formative assessment?
The framework provided by a learning progression's building blocks
Which of the following is NOT a likely reason that formative assessment is employed less frequently in our schools than the proponents of formative assessment would prefer?
The absence of truly definitive evidence that formative assessment improves students' learning