Don’t Smile About Smile Sheets
John E. Jones, Ph.D.
Organizational Universe Systems
Most training courses include end-of-course questionnaires that ask participants to rate various aspects of the experience. In the training industry, we refer to these instruments as "smile sheets" and often deride them in professional discussions.
So why do we use smile sheets? The six most common reasons:
* Training sponsors want them.
* The training staff wants to know "how’d we do?"
* The ratings look valid.
* People who pay the bills to have their employees developed want instant results; they often want participants to like the experience. (Sometimes they use training as a cosmetic intervention to Improve morale.)
* End-of-course questionnaires provide quick answers. They fit in with the short-term mentality of some managers — "profits this quarter".
* The professionals who staff courses also want immediate feedback, even if it is flawed.
One of the severe limitations of end-of-course feedback is that it violates what we all know to be effective criteria of information exchange. In spite of that, trainers and facilitators often have an almost morbid preoccupation with wanting to know how participants reacted to courses.
End-of-course questionnaires generate statistics with a pristine, truthful look, especially when the numbers come out of computers. This is the GITO phenomenon: Garbage In, Truth Out. People want end-of-course rating summaries because they look valid. Unfortunately, their usefulness (the chief criterion of validity in this context) is severely limited.
The 26 Faults
Believe it or not, this list of 26 limitations of end-of-course ratings is not exhaustive. That there are at least these many faults should cause evaluators to think skeptically about using such data.
1. Ratings don’t correlate with transfer of training. No available research shows a clear relationship between end-of-course ratings and the extent to which participants apply training on the job.
2. Many raters are unqualified. Training participants do not have the background to supply valid judgments about the effectiveness of a training course. They know what they like, but they are usually uneducated about educational theory and methods.
3. Trainees have uneven comparative histories. Trainees usually have highly varied training experiences with which to compare a given course. What one person considers to be outstanding may appear ordinary to another; both may have valid reasons for their opinions.
4. Data are retrospective. End-of-course ratings suffer from all the faults of retrospective data. What we remember may be highly inaccurate; our judgments about it may vary across time. Asking participants to rate the opening session on the last day is usually useless.
5. Responses are judgmental or subjective. Most end-of-course questionnaires ask respondents to make evaluations. Such ratings are, of course, judgmental and subjective. What one person sees as useful another may see quite differently.
6. Ratings are sensitive to mood. The activities that immediately precede the end-of-course ratings can affect the data. For example, a celebratory atmosphere in the room may improve ratings.
7. Trainees fear reprisal, even when surveys are "anonymous." Some participants almost always leave blank the demographic items on end-of-course questionnaires. In general, the more demographic items that are included, the more the fear of reprisal on the part of respondents.
8. Trainees do not complete surveys. Participants don’t want to complete long questionnaires from which they will not personally benefit. End-of-course evaluations, then, usually compromise in favor of brevity.
9. Ratings are sensitive to wording nuances. If you change a key word, the ratings change. Using loaded language or controversial terms can have serious and unknown effects on patterns of response.
10. Free-form comments are almost always predominantly negative. They receive too much weight, though they often cannot be confirmed. It is better to ask for comments only in the pilot study that precedes a system-wide survey Here such remarks can be useful in making the final instrument more comprehensible and clear.
11. Surveys set expectations for change. The act of asking for people’s input raises expectations that something will be done with the results. Training participants often ask later to hear how the staff used their ratings and suggestions. Be wary of asking about things that are not going to change.
12. Surveys are quick, taken at a time when people want to leave. Many training participants approach the task of evaluating a course in a cursory manner. After all, it is not their course. Their motivation is often to "get outta here" rather than to improve the course.
13. Statistical trends depend as much on group composition as on design and delivery. Comparing average ratings across sessions is not an "apple to apples" practice. The particular mix of participants—or even the presence of a difficult participant—can skew ratings.
14. Statistical trends are not comparable when design and delivery change. When a course takes place over a long period, there should be continuous improvements to its design and delivery That makes end-of-course ratings apply to non-comparable experiences; the statistical trends can be misleading.
15. Not all instrument items are equally important, and often some of them do not even match the rating scale. How a respondent interprets an item is a function of her or his experience base; no two training experiences are exactly alike.
16. Ratings of different parts of training (such as the opening, lectures, and hands-on activities) are always uneven and non-synergistic. For example, adults tend to rate lectures lower than they rate experimental training exercises. In addition, rating a presentation is a contaminated act. Reactions to different aspects of the training—the style of the trainer, interest in the content, use of audiovisual aids, and so forth—may co-mingle in the act of rating.
17. The emphasis is often on excitement. People like to have fun in training courses. Many resent it when the trainer asks them to work hard, even for compelling business reasons. Training that resembles "rest and recreation" often gets high marks. End-of-course ratings may reflect the extent to which people simply "had a good time."
18. Surveys may confuse "like" and "worth." Instruments that use rating scales that include words such as "like" and "satisfying" promote this confusion. What we need is a clear index of the usefulness of the session, not whether it made people smile.
19. Surveys tend to focus on what is wrong. On the basis of end-of-course ratings, training-design specialists may inadvertently change what is really working in the course. If a questionnaire only solicits data for improvement, "critiques" may become negative criticism.
20. The change-back effect is most likely. The most dramatic effect of training is no effect. Most training does not work. Most intended results are not demonstrably evident. There must be supportive changes in systems that reward and reinforce personal development. End-of-course ratings do not reflect the probable washout of effects.
21. Some effects of training are delayed. Some people continue to learn from a course long after it is "completed." Many former trainees remark that "lights went on" for six months after a course. End-of-course ratings do not tap that process.
22. When a course "raises the ante regarding taking responsibility for being an organizational leader, a trainee may arrive at the closing session with doubts about the future ("Do I want to stay?" "Do I have what it takes?"). Such concerns can depress end-of-course ratings.
23. Self-confrontational experiences can have temporary effects. In courses that include self-disclosure, feedback, risk taking, and confrontation, people may experience a lack validation of their self-concepts. The experience can be temporarily upsetting, and may result in low ratings.
24. Survey "report cards" put pressure on staff. There is a strong tendency to "play the ratings." If training managers use survey results in evaluating staff, trainers may choose not to confront participants and may even slip into entertainment postures. Many trainers see end-of-course ratings as a threat.
25. When the numbers come out of the computer, many trainers pay the most attention to the negative results. They remember the one or two "cheap shots" that participants wrote on the questionnaires and discount the positive feedback.
26. Surveys are about judging others’ work rather than taking personal responsibility. Perhaps the most serious fault of end-of-course ratings is that they are upside down. Participants earn their salaries while learning and planning how to apply the learning to their work. Most end-of-course questionnaires encourage them to make the trainer responsible for their learning.
What to Do Instead
With all of these limitations, what is to be done? Here are seven suggestions:
* If you must include end-of-course ratings, clean up your thinking about them. Reflect often and honestly about their faults.
* If your manager evaluates you on the basis of end-of-course ratings, do something dramatically positive just before people fill out the questionnaire. Have participants’ supervisors come to the final session, and have each person declare in front of all the worth of the experience and his or her intended changes. Then gather the data. When they leave, quietly consider the degree of personal integrity in such a strategy.
* Stick with business goals. Make each end-of-course survey item clearly relate to organizational needs.
* Focus on self-responsibility (What did you do here?). Have participants rate the extent to which they took personal responsibility for learning practical things that they clearly commit to use on the job.
* Limit questionnaires. Since you are getting shaky information, why ask for a lot of it? Keep end-of-course instruments brief.
* Set up control charts on key metrics. Specify what you are trying to influence in the organization, set up measurement systems, and track changes. Establish acceptable limits of variation; plan to intervene in other ways than with training (a "low-wattage" intervention at best) when measurements fall outside of control limits.
* Conduct follow-up studies. Study training effects on the job. Focus on observable behavior whenever possible, and include multiple sources of information.
I don’t mean to argue that trainers should never use end-of-course evaluations. Rather, I have provided a set of caveats about their limitations. We need more than one method and more than one data source for gathering complete information on the results of training. The effort is worth it, and the payoff can be significant.
Reprinted from Training & Development Journal, December 1990.
Organizational Universe Systems
Post Office Box 38
Valley Center CA 92082
760/749-0811 voice
760/749-8051 fax
http://ous.iex.net
jjones2@san.rr.com