Copyright © 2003-2008  The Center for Exercise Physiology.   All Rights Reserved.

 

 

               Journal of Professional Exercise Physiology        

Vol 6 No 2 February  2008    ISSN 1550-963X

 


Advertisements
 
 
 
 
 
 
 
 
 
 




 
Editor-in-Chief:  Larry Birnbaum, PhD, FASEP, EPC
An Internet Electronic Journal Dedicated to
 Exercise Physiology as a Healthcare Profession


Guidelines for Writing Multiple Choice Questions
Larry Birnbaum, PhD, FASEP, EPC
Department of Exercise Physiology
The College of St. Scholastica
Duluth, MN  55811

Assessment is a major issue in academia.  Opinions vary regarding the best means for assessing student achievement, particularly with respect to relevancy, reliability, and validity.  Despite the differences of opinion, written exams that employ multiple choice (MC) questions are standard fare in the college setting.  The MC format is also commonly used for standardized exams, such as licensure and certification exams.  Haladyna and Downing [1] cite a number of reasons for the popularity of MC questions.  Sampling of content is usually superior relative to other formats; MC formats generally produce more content-valid test score interpretations.  Test score reliability can be very high with ample numbers of high quality MC items.  Multiple choice questions can be easily pretested, stored, used, and reused (e.g., computerized item banks).  Tests can be scored objectively and quickly.  Most kinds of content can be tested with the MC format, including higher level thinking.

Given the common use of MC questions, exam writers should be concerned about reliability and validity of these exams.  Indeed, it is standard practice to provide training for certification and licensure exam item writers to assure a satisfactory level of reliability and validity of the exams.  There are techniques for determining validity and reliability of test questions that are beyond the intent of this article.  Put simply, a test question is valid if only students who know and understand the content answer the question correctly, and those who have not mastered the content get the question wrong.  A common criticism of MC questions is that test takers can guess the correct answer.  That is more probable for poorly written MC questions.  Guidelines have been developed that help minimize student guessing and improve the overall validity and reliability of MC questions.  Haladyna and Downing [1] developed a taxonomy of rules for writing MC questions based on a consensus of 46 authoritative references representing the field of educational measurement.  The validity of the rules was examined from the perspective of empirical research on MC item writing.  Their taxonomy follows with some additional commentary.

General procedural rules:

1.  Use best answer or correct answer format.

2.  Avoid K-type questions (complex multiple choice; e.g., select 1 if a, b, and c are correct; 2 if a and c are correct; 3 if b and d are correct, etc.).  This type of question requires more time to take the exam, and there is a greater chance of intra-item cluing.

3.  Format responses vertically, not horizontally.

4.  Use correct grammar, punctuation, and spelling.

5.  Minimize examinee reading time.  Questions that require a lot of reading will necessitate more exam time or fewer questions.

6.  Avoid trick questions.  Haladyna [2] provides the following examples of trick questions:

a.  Intentionally try to deceive, confuse, or mislead.

b.  Trivial content.

c.  Discrimination among responses is too fine.

d.  Irrelevant window dressing (i.e., excessive, unnecessary verbiage).

e.  Multiple correct answers.

f.  Principles are presented in ways that were not learned.

g.  Too ambiguous.

 

General content rules:

1.  Base each item on an objective.

2.  Focus on a single problem/idea (one type of content).  If more than one type of content is included, and the student gets the question wrong, you cannot be sure which content was missed.

3.  Use vocabulary appropriate for examinees.

4.  Keep items independent of each other (no inter-item cuing).

5.  Avoid overly specific (too trivial) and overly general knowledge (too vague).

6.  Avoid verbatim phrasing (from text).

7.  Avoid items based on opinions.

8.  Include higher level thinking items (not just recall).

9.  Test for significant material (no trivial items).

 

Guidelines for stems (the question part; not the possible responses):

1.  State the stem in question form as this is the most direct way of getting to the central idea of the test item.  Open-ended (sentence completion) stems are suitable provided most of the verbiage is in the stem.  One small study supported the use of either [3].  It has been argued that sentence completion items take more time, make reading comprehension more difficult, and increase test anxiety [2].  Also, the completion type should just require the end of the sentence to be completed.  Avoid using blanks at the beginning or in the middle of the question/sentence – more difficult to read and require more time [2].

2.  Be sure directions in the stem are clear, and wording lets the examinee know exactly what is being asked.

3.  Avoid window dressing (excessive wording).

4.  Avoid negative phrasing (e.g., except, not).  Some experts believe that negative words have negative effects on students [2].  There is not universal consensus on this rule [4].  If a negative term is used, highlight it in bold, use upper case letters, and/or underline it.

5.  Include the main idea and most of the verbiage in the stem.

 

Guidelines for responses:

1.  Use as many responses as feasible; three options are recommended [5].

2.  Place responses in a logical or numerical order (ascending or descending).  This improves efficiency since the test taker does not have to hunt for the correct answer.

3.  Keep responses independent; responses should not overlap.  For example, if ranges of numbers are the responses, the ranges should not overlap.

4.  Keep all responses homogeneous in content.  Responses that are heterogeneous in content may cue the test taker [2].

5.  Keep the length of responses consistent.  The longest response is often the correct answer.

6.  Avoid using “all of the above.”

7.  Avoid using “none of the above” or use it carefully.  About half of those reviewed feel it should not be used [4].  “None of the above” is more acceptable than “all of the above”, especially for calculations (e.g., math problems).

8.  Phrase responses in the positive; avoid use of negative terms.

9.  Avoid distractors that can clue the examinee.

a.  Clang associations (i.e., a word or phrase in the stem also appears in a response).

b.  Absurd, ridiculous responses.  Incorrect responses should be conceivable.

c.  Formal prompts.

d.  Overly specific or overly general clues.

10. Avoid giving clues inadvertently by using incorrect grammar.  For example, all responses for sentence completion types of stems must complete the sentence in a grammatically correct fashion.

11. Avoid absolutes (e.g., always, never, totally, absolutely, completely).  These are rarely correct.

12. Be certain there is only one correct response.

13. Use approximately equal numbers of positions for the correct responses (e.g., if the response choices are a, b, c, and d, there should be a equal number of a’s, b’s, c’s, and d’s as correct responses).  This should help reduce the tendency of students to follow patterns [2].

14. Use common errors of students as incorrect responses.

15. Avoid use of technical jargon.

16. Use familiar phrases as incorrect responses.

17. Use true statements that do not correctly answer the question as incorrect responses.

18. Use humor sparingly – depends on the setting.  Humor should not be used in formal testing programs (e.g., certification, licensing) [4].

 

References

1.  Haladyna, T.M., Downing, S.M.  (1989).  A Taxonomy of Multiple-Choice Item-Writing Rules.  Applied Measurement in Education, 2(1), 37-50.

2.  Haladyna, T.M.  (1999).  Developing and Validating Multiple-Choice Test Items, 2nd ed (pp. 76-97).  Mahwah, NJ:  Lawrence Erlbaum Associates.

3.  Sireci, S.G., Wiley, A., Keller, L.A.  (1998).  An Empirical Evaluation of Selected Multiple-Choice Item Writing Guidelines.  Paper presented at the Annual Meeting of the Northeastern Educational Research Association, October 28, 1998, Ellenville, NY.

4.  Haladyna, T.M., Downing, S.M., Rodriguez, M.C.  (2002).  A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment.  Applied Measurement in Education, 15(3), 309-334.

5.  Rodriguez, M.C.  (2005).  Three Options are Optimal for Multiple-Choice Items:  A Meta-Analysis of 80 Years of Research.  Educational Measurement:  Issues and Practice, 24(2), 3-13.