Reading Reflection #7: Standardized Tests, Measures of Central Tendency, Test-Preparation Practices

Ch13 p. 332 # 1: If you had to use only one of the three individual student interpretation schemes treated in the chapter (percentiles, grade-equivalent scores, and scale scores), which one would it be? Why did you make that choice?

I would probably choose to use the percentiles because it does not just look at numbers, but also compares the results to the scores of similar students. I usually do not like when people are put inside boxes, but I think that it would it would be helpful to analyze the reliability of a test that I am giving to my students. Percentiles would help me compare the results of different norm groups, and make sure that I do not have any existing bias within a test.

#2: It is sometimes argued that testing companies and state departments of education adopt scale-score reporting methods simply to make it more difficult for everyday citizens to understand how well students perform. Do you think there’s any truth in that criticism?

I think that there could be truth in this criticism, but I hope that it is false. After reading through the explanation of what scale scores are, I completely understand how easily they can be misinterpreted. On the other hand, I like how they represent levels of difficulty. This type of student interpretation scheme gives educators the chance to track student progress. I think that in education, there is more and more flexibility for individual student growth. The scale scores fit this trend and provide an opportunity to be a powerful assessment tool. All this is to say that yes, there is great opportunity for misuse of scale-score reporting methods. However, there is also great opportunity for testing companies and state departments of education to use scale score testing to better represent student learning.

Ch14 p. 350 # 1: Can you think of guidelines, other than the two described in the chapter, to be used in evaluating a classroom teacher’s test-preparation practices? If so, what are they?

I might add that no test-preparation practice should favor one student over another in assistance. I think that this is automatically a truth that teachers ethically live by, but it is also a good reminder. I could easily see some teachers wanting to help certain students and give them an extra push to get them ready for a test. The reason for this is because so many teachers see the potential in students and so badly want them to reach that potential. Teachers might rationalize their actions of giving extra direction to certain students when they need to instead allow students to accomplish tests on their own.

#2: Can you think of any other sorts of test-preparation guidelines that are meaningfully different from the five described in the chapter? If so, using the chapter’s two evaluative guidelines or any new ones you might prefer, how appropriate are such test-preparation practices?

Since most high school classes involve note taking, I think that a test preparation based off of students’ class notes would be meaningful. This would provide a challenge because it would mean that students have to take correct notes, but I believe that teachers should be giving instruction on how to do so anyway. To help guide students in the right direction, teachers could give the class an outline of the subjects that will be covered in the test. The students could then create a study guide from the notes that they already have, and put into action the knowledge that they have been learning. I think that this would fit well in the evaluative preparation practices because it would be drawing from classroom content that has already been directed towards the curricular aim of the class. As long as the teacher has taught within the guidelines of professional ethics, the students’ notes would be quite appropriate.

Reading Reflection #6: Test Improvement, Formative Assessment

Ch11 p. 267 #1: Why is it difficult to generate discrimination indices for performance assessments consisting of only one or two fairly elaborate tasks?

I think that it is difficult to generate discrimination indices for theses types of performance assessments because of the lack of variety.  If you create a test with only one elaborate task on it, it will inevitability be a poor assessment of a classroom full of students with different backgrounds, gifts, and abilities. A performance assessment that is lacking variety in its type and level of difficulty will not be an accurate look at the class as a whole. Since this would not be a holistic look into an entire class of learners, the discrimination indices for this assessment would subsequently be weighed to one side.

#2: If you found there was a conflict between judgmental and empirical evidence regarding the merits of a particular item, which form of evidence would you be inclined to believe?

This is a difficult question, and I believe that the answer would be different based on the classroom and subject at hand. Since I teach visual art classes through the form of technology, a majority of my assessments would not usually fall well into the category of numbers. This is not to say that I am against “numbers” as the reading suggested. However, I do believe that certain assessments fit better in certain classes. My classes are project based and do not provide many situations where there is an “available wrong or right answer” because of how subjective art can be. With that said, I would be more inclined to believe judgmental evidence because it would display a well-rounded viewpoint of a student’s learning in my class. This type of judgmental procedure would also be on multiple chopping blocks from myself, colleagues, and eventually students so that it can be as effective as possible.

Ch12 p. 303 #2: What strategies do you believe would be most effective in encouraging more teachers to adopt the formative-assessment process in their own classroom?

I believe that if teachers were able to see how well formative assessment is focused on student growth, they would jump at the opportunity for such an activity. “The reading said that formative assessment is “a process, not a test”, which I think would be an important feature to highlight when encouraging teachers to adopt this approach. Teachers already know that a student’s learning is a journey and not just simply a destination. I think that the idea of “learning progression” would be an important aspect to highlight because it shows that formative assessment aligns with the scaffolding plan that hopefully is already set up in the classroom. This learning progression shows teachers how effective it is to have a target curricular aim, to then define the necessary building blocks to reach that target, and then arrange those building blocks in sequential order.

#4: If you had to choose the single most important impediment that prevents more teachers from employing formative assessment, what would this one impediment be? Do you think it is possible to remove this impediment?

I would say that the impediment that prevents teachers from employing formative assessment is the failure for external accountability tests to accurately mirror the improvements that occur in the classrooms where formative assessment is used. I think that most teachers would be excited to use formative assessment in their classrooms because of the overall positive effect that it can have on students’ learning. However, teachers still need to deliver evidence to the state. If teachers see that the external tests do not display the evidence that formative assessment gives, then they will not be motivated to concentrate on formative assessment. I think that teachers would first focus on getting students ready for high stakes tests, and if there is time left over (and there is usually not), a teacher might then try out formative assessment. I hope that it is possible to remove this impediment, but I don’t know how. The last teaching conventions that I attended did focus on formative assessment, and I hope that this could be a trend that motivates the academic community to try something new.

Reading Reflection #5: Performance Assessments, Rubrics, Portfolios

Ch8 p. 209 #1: What do you personally consider to be the greatest single strength of performance assessment? How about the greatest single weakness?

I think that the greatest strength of performance assessment is that it allows for authentic student learning. This creates space for students to work with the subject matter in a way that forces them to “apply” their knowledge rather than simply memorizing and information dumping. Since many students are driven by grades, it is a popular habit to cram information right before a test to get a good grade, but afterwards the knowledge is soon lost. Performance assessment forces students to critically think about their learning and interact with information in a real-world context.

I would say that the greatest weakness of performance assessment is that it is very difficult to judge the “adequacy of student’s responses”. Since grades are so important in school and especially high school, they need to be valid. As the reading states, a performance assessment is based on “constructed response measurement procedures” where the student gets to come up with their response rather than selecting it from a test. Since each individual student is coming up with their own response, it is very difficult to create an even grading playing field that will accurately apply to a classroom of differentiated student responses.

#5: Do you prefer holistic or analytic scoring of student’s responses to performance test? And, pray tell, why?

At this point, I prefer a holistic scoring of student’s responses to performance tests. As I am growing and learning as a teacher, I realize that this viewpoint could change in the future. The reason that I prefer a holistic response to scoring performance tests is mainly because of the style of my classes. I teach technology/visual art classes where the classes are based around projects. These projects have enough flexibility for students to make them their own through their individual passions and interests. With that said, it is very difficult to analytically grade these projects. This is not to say that I lack a scoring rubric with specific evaluative criteria. On the contrary, I start each project by showing students what a complete grade would look like and also what a “going above and beyond expectations” grade might look like. A holistic approach to scoring works better in these classes because there is not always a right or wrong answer.

Ch9 p. 227 #2: Three purposes of portfolio assessment were described in the chapter: documentation of student progress, showcasing student accomplishments, and evaluation of student status. Which of these three purposes do you believe to be most meritorious? And, or course, why?

I think that the purpose of documentation of student progress through portfolio assessment is very impactful for a student’s learning. This teaches students to learn and grow, and to also realize that they are growing. I have seen so many light bulb moments occur in my students when they look at their previous work and realize how far they have come. Students do not always see this growth and can easily lose momentum in their learning because they “feel” that they are not making headway. However, when a teacher can show them evidence of their individual growth, it is highly motivating and encouraging for students.  This documentation of student progress also teaches students to be self-evaluative of their journey in learning, which I think is vital for their progress in the classroom.  This purpose of portfolio assessment is not only effective for a student to see his or her individual progress, but of course it is also highly beneficial for a teacher to assess a student’s growth in learning.

#4: If it is true that portfolios need to be personalized for particular students, is it possible to devise one-size-fits-all criteria for evaluating classroom portfolio work products? Why or why not?

This is a difficult question to answer because I believe that a consistent criteria evaluation can be done but I don’t think it will always fall into the “fair” category.  As I previously mentioned, I teach technology classes where most of the activities done in class are the students learning software programs through projects. When I first started evaluating these projects, I soon realized that they would be very difficult to grade. There were some gifted students who did an amazing job, but had used those gifts to goof-off during class and quickly finished it right before it was due. There were other students who put every ounce of effort into the projects, but did not have the same quality of work. In response to this challenge, I created a 3-part grading rubric that would focus on individual student learning. An example of this is as follows: 10-points for completion of the project’s requirements, 5-points for in class participation (involvement during lessons), and 5-points for effort (staying on-task during project time). This rubric sometimes changes based on the project, but the idea stays the same. I wanted to reward students for their hard work, and “hard work” looks different based on each individual student and their growth in my classroom. I think that an approach like this works well for evaluating classroom portfolio work because it does not compare one student’s work to another. Instead, it focuses on a student’s personal growth, effort in learning, and involvement in the class. After implementing these ideas, I saw a major positive change occur in my students’ dedication to the class, which was very encouraging.

Reading Reflection #4: Binary Choice, Multiple Choice, Matching, Short Answer and Essay

Ch6 p. 161 #1: If you were asked to take part in a mini-debate about the respective virtues of selected-response items versus constructed response items, what do you think would be your major points if you were supporting the use of selected response test items?

To start off, I would say that selected response items leaves more time for students to engage with the test material. With that said, if more students are able to complete more of a test in the time given, then the results would be even more reliable. Selected response test items create an opportunity for students to apply their knowledge through the use of comparing and contrasting the information. This situation not only furthers the students’ learning to better understand the subject matter, but also provides the teacher with a clear assessment of where students might be confused. For example, if students continue to mark the same wrong answer on a multiple choice test, the teacher would then be able to see where students’ misconceptions might have occurred. Finally, the use of selected response test items gives room for a great deal of information to be covered in an assessment. The more material that is given to students to test their understanding, the more results a teacher then has to learn from.

#3: Why do you think that multiple-choice tests have been so widely used in nationally standardized norm references achievement tests during the past half-century?

I think that multiple-choice tests have been used as the nationally standardized norm references achievement tests because it is a highly efficient way of collecting a mass amount of student data. As the reading explained, this method is also more reliable than other selected-response items. I realize that there are multiple opportunities for holes in multiple-choice tests, but they are still the most dependable option that currently exists for mass distribution. Another reason that I believe why multiple-choice tests have been widely used in nationally standardized norm references achievement tests is because it creates a consistent grading system. As long as a multiple-choice test has been effectively created with no accidental hints, it will be a reliable base for grading.

Ch7 p. 184 #1: What do you think has been the instructional impact, if any, of the widespread incorporation of student writing samples in the high stakes educational achievement tests used in numerous states?

I would guess that the incorporation of student writing samples in the high stakes educational achievement tests has been effective for some students, but for other students it might have a negative impact. The reason that I bring this up is that some schools may not focus on the students’ use of writing samples. If students have not been given adequate practice and instruction of what a good essay looks like on the platform of a test, then I feel that would have a different effect on the test results. With that said, the implementation of student writing samples in the high stakes educational achievement tests would cause teachers to see a need that would have to be addressed. Teachers would probably then focus a great deal of attention of classroom instruction on how to produce a good writing sample on a test. Writing sample assessments would be incorporated into lesson plans and daily instruction so that students are prepared for the achievement tests.

#2: How would you contrast short-answer items and essay items with respect to their elicited levels of cognitive behavior? Are there differences in the kinds of cognitive demands called for by the two item types? If so, what are they?

After the reading described the different variations of challenges that short-answer items and essay items create, I feel that there is more room available for error with essay items. This is not to say that essay items are not an excellent tool, but it is instead to say that the implementation of essay items must be carefully put forth. I feel that the cognitive demands of essay items is a wider range because it not only asks students to use their knowledge of the subject matter, but also to use their writing skills (or lack there of). Should good writing and correct English be expected from seniors? Yes, but should it be expected from freshmen? No, because freshmen have not gone through the 3 years of English that they are required to take in high school.  Are some students inherently gifted in writing and would thus perform better in an essay test? Possibly. A short-answer item does not require the student to display good use of sentence structure, but instead has the main focus on the information that is being assessed.  This is not to say that I do not expect students to learn and implement English writing skills in school, but instead to show that there are differences in the kinds of cognitive demands called for by short-answer items and essay items.

Reading Reflection #3: Bias, Directions

Ch5 p. 135-136 #1: If you were asked to support a high school graduation test you know would result in more minority than majority youngsters being denied a diploma, could you do it? If so, under what circumstances?

To adequately answer this question, I would first look at the test to see why it would result in more minorities being denied a diploma. Is this outcome being expected because the test contains bias against minorities? Does it contain language and questions that one group of people would better relate to than another? If the answers to these questions continue to be “yes”, then I would not support that high school graduation test. The next step would probably be to talk to my administration and present my opinion. Hopefully they would have ears to listen and then the necessary changes could then be implemented. I think that a challenge would occur if this test were a statewide requirement because the powers that be would be much harder to get a hold of. On the other hand, after careful study of the test, what if no biases were found? If the students have been correctly taught the information but choose not to utilize it, then that is on the student’s shoulders. Under these circumstances, I think that it would be completely fine for students to take this test. However, it seems like this is a fine line that is being walked on between the existence of bias and the absence of bias. I think that the answer is for testing specialists to devote time and effort into taking these tests apart piece by piece, and teachers also need to be aware of these circumstances.

#5: What is your view about how much effort a classroom teacher should devote to bias detection and bias elimination?

I think that teachers should put a great deal of effort into bias detections and bias elimination. America is a diverse nation, and our schools hold so many different groups of people. All students should have an equal opportunity to do well on a test. If a student does badly on an assessment, if should not be because of any bias in the classroom. I think that this starts with teachers being aware of their classroom and unassuming of their students. It takes time and effort to go over assessment devices with a fine-tooth comb, but it is well worth it. If teachers truly want their students to succeed, then they need to give them every possible opportunity to do so. Bias detection and bias elimination is going to help students reach the high expectations that teachers are laying before them; why wouldn’t teachers want to put effort towards that?

#6: What kind of overall educational assessment strategy can you think of that might make the testing of students with disabilities more fair? How about LEP students?

When discussing the educational assessment strategies that teachers should take concerning students with disabilities, I think that it is important to make sure that the assessments are still aligned with the curricular goals for the rest of the class. The text made an important point about the testing of students with disabilities by explaining that teachers need to understand that “…the education of all but a very small group of those children must be aimed at precisely the same curricular targets as the curricular targets teachers have for all other students” (Popham, 2011, p. 124). I believe that educators easily fall short in this area and lower the classroom expectations for students with disabilities or LEPs because they do not want those students to fail. I would challenge teachers to develop assessments that allow for flexibility in it’s implementation, but the different forms would still align with the overall curricular goals. I think that formative assessment lends a helping hand to educators with this challenge. Formative assessment would allow teachers to have the same end goal to an assessment, but have different forms  it comes in that would work for students with disabilities and LEPs. I think that the word “fair” is a really hard thing to reach in education. Honestly, it is not all “fair”.  If all students are given the same test across the board, then it is not fair for students with learning disabilities and LEPs. If some students are given one test and then other students are given a different tests with accommodations that is also technically not fair. I think that we should move away from the focus being on fairness and move it towards answering the question, “How are all of my students learning and growing?”

#7: Can bias in educational assessment devices be totally eliminated? Why or why not?

I believe that steps to eliminate bias in educational assessment devices should always be taken. Since we live in a broken world with flawed humans running things, I honestly do not know if bias will ever be totally eliminated. On the other hand, I do think that we can get pretty close. The more effort that teachers, administration, and assessment specialists put into eliminating bias, the better assessment is going to be. Since society is always changing and growing, I think that it is a bit unrealistic to say that we can eliminate bias both now and forever more.  The text said that the more that teachers work to become “sensitive to the existence of assessment bias and the need to eliminate it”, the closer that we will come to the end goal (Popham, 2011, p. 119). Do I think that we will reach that goal and stay there? No, but do I think that an effort to reach that goal will be highly effective in changing student’s lives for the better? Yes.

Reading Reflection #2: Reliability and Validity

Ch3 p. 80 #3: What kinds of educational assessment procedures do you think should definitely require the assembly of reliability evidence? Why?

I was going over this question for a while to try and decide what my opinion is on this subject. I think that I agree with the author when reliability in the classroom was brought up. I do not think that it is absolutely crucial that a teacher involves all three types of reliability evidence because all three may not fit into every classroom situation. Teachers should focus on knowing what reliability is but do not need to get carried away with every specific test. Teachers should understand the concept so that they can then take the truth of reliable assessment and apply it in their own classrooms. Teachers need to continuously measure student’s growth and they need to understand what works and what doesn’t in terms of assessment. I think that the statewide tests that are a requirement for schools should absolutely require the assembly of reliability evidence. Since so much importance is placed upon student test scores, I think that evidence should always be presented on how reliable the tests are in the first place.

#5: What is your reaction to classification consistency as an approach to the determination of reliability?

I understand the argument with classification consistency that is being placed on the table but I do not completely agree. I do agree that a test’s reliability and consistency should be based on much more then a one time test. This is especially true if the test is exempting qualified students from a topic of study because of their scores. However, I think that a test would lose a large portion of its validity when it is taken a second time by the same students who have already gone through the test once.  I understand that a student’s achievement on a test is not going to only reflect their knowledge, but also their mood, emotions, physical health, and so much more. However, the evidence of a test’s consistency will be skewed if the results are coming from students who have gone through the same test twice.

Ch4  p. 109 #2: It was suggested that some measurement specialists regard all forms of validity evidence as construct-related validity evidence. Do you agree? If so, or if not, why?

After studying this subject, I think that I agree with the connection between all forms of validity evidence and construct-related validity evidence. The reason for this belief is that construct-related validity evidence is the original idea of creating a hypothesis that then leads to an experiment that then leads to results. The construct-related validity evidence answers the important question of, “Is the test accurately measuring what it was created to measure?” This specific result comes from careful study and planning that is put into a hypothesis before the testing actually begins. The end result is then studied and the test is measured for its effectiveness. All of these components are important elements of the validity evidence that I have studied so far. I agree that there are still some specific differences that make different types of validity evidence stand apart from each other. However, on a larger scale, I think that the overall idea of construct-related validity evidence can be found in the other forms of validity evidence.

#5: What kind(s) of validity evidence do you think classroom teachers need to assemble regarding their classroom assessment devices?

I believe that the necessary validity evidence will be different based on the teacher and the classroom. Since I have started my teaching program, I have noticed that teachers have so many different ways of teaching. In this realization, I have found that different teachers also have differing opinions on which assessment devices they prefer to use in the classroom. I would say that when teachers put together their classroom assessment devices, they probably look to use the content-related evidence of validity in most situations. I think that the construct–related validity evidence would also work, but that the content-related evidence would still be most applicable in classroom settings.  The content-related evidence would work in tests and quizzes that assess student’s knowledge of the subject matter. This would allow teachers to see if the tests results match up with the curricular aims of the class. I like how the text said, “ The only reason teachers should assess their students is to make better educational decisions about those students.” (Popham, 2011, p. 87).  The content-related validity evidence allows teachers to plan out where they want their class to end up and the steps they need to take to get there.

Reading Reflection #1: Norm-Referenced vs. Criterion Referenced

Ch 1 pg. 26 #5: Do you think the movement to discourage the use of the terms intelligence and aptitude is appropriate? Why?

From the educational circles that I have been a part of in the past, I have seen intelligence become a word that is associated with a gift that has been given to some and not others. I have also seen this word be mostly associated with the idea of being “book smart” and is usually a quality that is seen in students who naturally excel in academic settings. With that said, I see that the gifted students who are regarded as intelligent are usually students who have excellent memories to retain information and are able to understand and comprehend information easily. Students who do not have these gifts, but are talented in other ways can easily feel “stupid or dumb” because they have issues keeping up with traditional education. I believe that students should be reminded that there are different types of intelligence such as social intelligence or creative intelligence. I do not think that aptitude is also the correct word to use because as the book said, the term tends to create an idea that there could be a glass ceiling in regards to a student’s potential. Students who do not feel that they fit into the typical stereotype of intelligent often feel that they also do not fit well into school, and I think that is a lie that should be cast out of educational systems.

Ch 2 pg. 58. #1: It was argued in the chapter that most classroom assessment tasks call for criterion-referenced approaches to measurement. Do you agree? Why or why not?

I agree that most classroom assessment tasks call for criterion-referenced approaches to measurement because it focuses on if the students completed the main focus of the test. I think that the text makes an important point by saying that these assessments are not “tests” of student progress but instead they are “interpretations”. A criterion-referenced assessment will usually be a better approach in the classroom because teachers can easily utilize information about their student’s knowledge base. I feel that it is more helpful for school administrators to see the outcome of Norm-Referenced measurement because they need to focus on meeting the established requirements.

#2: Why should/shouldn’t classroom teachers simply teach toward the content standards isolated by national groups? If teachers don’t, are they unpatriotic?

I believe that teachers should challenge their students to rise above the norm.  If we tell students that the only standards that they need to meet are inside their own personal bubble, then we are placing a limit on the amount that the students can grow. Students need to realize that they live in a world that is much bigger than themselves. We are not only teaching students important knowledge that they will use for the rest of their lives, but we are also shepherding them to learn how to be a part of this world that they live in. Educators would be improperly setting up students to have a closed mindset if they were only teaching toward the content standards isolated by national groups. I believe that this is not unpatriotic, but instead a realistic viewpoint of the growing world we live in.

#3:  If you discovered that your state’s educational accountability tests (a) attempt to measure too many content standards, (b) are based on badly defined content standards, and (3) don’t supply teachers with the per-standard results, how do you think such shortcomings would influence your classroom assessments? How about classroom instruction?

These shortcomings would influence my classroom assessments because it would lower the expectations for my student’s potential. I think that students would feel overwhelmed by being measured up against too may content standards or badly defined standards. In this situation, they could easily just stop putting forth effort because they feel that the expectations are unrealistic. The text brought up an interesting point by saying that “high standards” seem to be a holy grail for educators to reach, rather than the focus being on instructional objectives. Americans like results. However, if the focus is so much on the results and not on the process that actually gets us to those results then classroom instruction will definitely be negatively affected.