The Item In Question

A relatively easy way to assess (as it were) the effectiveness of any assessment is to look at the item format of the questionnaire. Items are the questions that make up any assessment. Item construction is an interesting combination of art and science. Occasionally, someone questions the effectiveness of self-reporting instruments in general. It is certainly possible to have a bad experience with some assessment products. That is not because it is  “self-reporting.” I had a bad meal in a restaurant once upon a time, but that was not a reason to indict restaurants as a group. Similarly, the effectiveness of assessment instruments depend upon several specific factors. Among these are the constructs chosen, the presentation, the user experience, and primarily, the item construction. When proper attention is paid to these factors, self-reporting instruments can produce accurate data.Over the many years that psychometricians have worked to refine the science in the pursuit of ever more accurate methodologies, item construction has evolved dramatically. The following is a simple history of that evolution, pointing out some of the qualitative issues that marked some of the more common types.


One of the earliest assessments offered a list of adjectives. The candidate was asked to select the adjectives that they felt “described them.” Some instruments added a duplicate list of adjectives and asked to select the ones that their friends would use to “describe them.”

Points of Concern about Adjective Lists
  • Candidates tend to select those adjectives that are most flattering or that seem most supportive of whatever job is being considered. Example: Sales candidates rarely choose timid or shy, choosing instead persuasive or direct.
  • Adjective checklists do not actually measure traits of people. Instead, they use the person’s choices to sort them into a simple model of personality styles.
Common Examples of Products Using Adjective Checklists
  • Predictive Index
  • Omnia


Recognizing the problems that were inherent in adjective checklists, test developers introduced forced choice items. These generally offer a choice of three or four words or phrases. The candidate then selects the one that MOST describes them and the one that LEAST describes them. The most common version of this kind of item is found in DISC-type instruments. These words groups were first put together by William Marston in 1928. While the words or phrases have changed somewhat over the last eighty years, the fundamental forced choice format has not.

Points of Concern about Forced Choice Items
  • Forced choice items produce ipsative scores. Ipsa is from the Greek for self. It means that forced choice items produce scores that are only meaningful for that one person. They cannot be compared to anyone else’s scores, even if they have taken the same assessment. The reason is that there is no way of knowing how big the gap is between the MOST and the LEAST choices. It is relative to each individual’s particular situation and personality. One way of understanding this is to imagine asking several individuals if it takes a long time for them to go to where they work. The Wall Street banker who lives in Westchester replies, “No, I can be in Manhattan in an hour.” The executive who lives in the Atlanta suburbs replies, “No, I can be downtown on Peachtree Street in only thirty minutes.” The attorney who lives in Rome, Georgia, replies, “No, I can be in my office in ten or fifteen minutes.” The fourth person asked replies, “No, I just go downstairs to my office.” Each person answered the same “No” but the extent of the difference was dramatic. The same is true with forced choice assessments. You cannot compare two reports produced by forced choice items. You cannot meaningfully compare two individuals using forced choice items.
  • Forced choice data cannot be used to create norms for jobs.
  • While it was an early innovation, psychometrics has gone far beyond the forced choice methods. Anything that can be done with forced choice products, such as DISC, can be done better with newer types of assessments. Even team building workshops are more effective when  newer assessments are used.
Examples of Products Using Forced Choice Items
  • Performax
  • Other DISC – type products
  • RightPath
  • Personalysis
  • MBTI


True – False items present a statement describing a behavior or a belief  which the candidate is asked to confirm as being either True or False for them. True – False items are usually quick and easy to answer. Unfortunately they share the same issue that cripples forced choice items.

Points of Concern Using True – False Items
  • It is impossible to know the extent of the gap between True and False. For example, the candidate responds “True” to the statement, “I follow the rules.” Do they follow all rules, some rules, one of the rules? The absolute quality of the answers blurs the actual meaning.
Examples of Products Using True – False Items
  • Birkman


Items using Likert scales present a statement describing a behavior or a belief, similar to True – False items. The difference is that the candidate is given a range of choices with which to rate their agreement or disagreement with the statement. Example: Strongly Agree, Somewhat Agree, Neither Agree nor Disagree, Somewhat Disagree, or Strongly Disagree. Likert items commonly have 5 – 10 point ranges. The advantage of this format is that it is possible to assess the degree of agreement or disagreement because of the candidate’s increased choices. The more accurate instruments use this format, and so do almost all of the serious business-oriented instruments developed in recent years.

Points of Concern Using Likert Items
  • The issues that occur with Likert items do not relate to the format as much as with the actual wording of the statements. Each statement must include only one behavior, behavior or belief. For example, this is an item from a product using Likert items but which contain errors in the construction of some of the items: “I like to work slowly and carefully.” One could work quickly but carefully. One could work slowly but carelessly. Each word has a slightly different connotation. Therefore, any response to the item is unclear as to which one or maybe both factors influenced the answer.
  • The most common Likert items have a 5-point range for responses. Some have 6-point or 7-point ranges.  When the ranges are greater than 7, the data becomes less meaningful because the standard deviation become quite large.
Examples of Products Using Likert Items
  • The Prevue
  • BestWork DATA
  • Profile XT