April 16, 1998
Thomas Landauer | (303) 492-2875 |
Darrell Laham | (303) 499-3664 |
Peter Foltz | (505) 646-1980 |
Peter Caughey (Public Relations) |
(303) 492-4007 |
COMPUTER SOFTWARE GRADES ESSAYS
JUST AS WELL AS PEOPLE, PROFS ANNOUNCE
New computer software can grade the content of essay exams just as well
as people and could be a major boon in assessing student performance, researchers
at the University of Colorado at Boulder and New Mexico State University
announced today.
"From sixth graders to first-year medical students we get consistently
good results," said Thomas K. Landauer, a CU-Boulder psychology professor
who has worked on the technology behind the program for 10 years. "It's
ready."
The computer software, called Intelligent Essay Assessor, uses mathematical
analysis to measure the quality of knowledge expressed in essays. It is
the only automatic method for scoring the knowledge content of essays that
has been extensively tested and published in peer-reviewed journals.
The system was developed by Landauer, Darrell Laham, a CU-Boulder doctoral
student and Peter W. Foltz, an assistant professor of psychology at NMSU.
They will discuss the system Thursday, April 16, during the annual meeting
of the American Educational Research Association in San Diego.
"We are continually surprised at how well it works," said Landauer,
who started on the project as director of cognitive science research at
Bellcore in New Jersey.
The grading system has important implications for assessing student writing
and helping students improve their writing, Foltz said. In one of his undergraduate
psychology classes at NMSU last fall, Foltz tested a version of the program.
"Students submitted essays to a web page and received immediate feedback
about the estimated grade for their essays, and suggestions about what was
missing," Foltz said. "Students could revise their essays and
resubmit them as many times as they wanted. The students' essays all improved
with each revision."
Foltz also gave students the choice of having their essays graded by a human
or by the computer. "They all chose to have the computer do the grading,"
he said.
Educators laud essay exams because they provide a better assessment of students'
knowledge than other types of tests. A huge drawback is that the tests are
time-consuming and difficult to grade fairly and accurately, particularly
for large classes or nationally administered exams.
But computer-based evaluations of student writing are becoming increasingly
feasible because of the growing numbers of students who write using computers.
The researchers have applied for a patent on their software.
The new system requires a computer with about 20 times the memory of an
ordinary PC to do the statistical analysis that it needs to "understand"
essays. It uses Latent Semantic Analysis, a new type of artificial intelligence
that is much like a neural network. "In a sense, it tries to mimic
the function of the human brain," Laham said.
First the software program is "fed" information about a topic
in the form of 50,000 to 10 million words from on-line textbooks or other
sources. It learns from the text and then assigns a mathematical degree
of similarity or "distance" between the meaning of each word and
any other word. This allows students to use different words that mean the
same thing and receive the same score. For example, they could use "physician"
instead of "doctor."
The program then evaluates essays in two primary ways. The first is for
a teacher or professor to grade enough essays to provide a good statistical
sample and then use the software to grade the remainder.
"It takes the combination of words in the student essay and computes
its similarity to the combination of words in the comparison essays,"
Laham said. The student then receives the same grade as the human-graded
essays to which it is most closely matched.
"The program has perfect consistency in grading - an attribute that
human graders almost never have," Laham said. "The system does
not get bored, rushed, sleepy, impatient or forgetful."
In one test, both the Intelligent Essay Assessor and faculty members graded
essays from 500 psychology students at CU-Boulder. "The correlation
between the two scores was very high - it was the same correlation as if
two humans were reading them," Landauer said. The software only evaluates
knowledge content and is not designed to grade stylistic considerations
like grammar and spelling, researchers said. Existing programs already can
do those functions.
A second Intelligent Essay Assessor method compares all the student essays
to a single professor's or expert's essay, a so-called "gold standard."
A third variation can tell students what important subject matter was missing
from their essays and where to find it in the textbook.
Previous methods of automatic essay scoring simply counted words and then
analyzed mechanics and aspects of grammatical style, the researchers said.
There is a strong correlation between students who write the most and students
who write the best, researchers said. This is because students who know
a lot write a lot.
The amount of content also counts in the Intelligent Essay Assessor, but
it is measured by concepts, not by the number of words. The researchers
recommend setting an essay word limit to eliminate length as a factor.
Because the system does not analyze surface form, it is possible that someone
could include all the right words in an essay - in random order - and get
a good grade, they said. The system will flag unusual essays for that and
other reasons for a human to check. But the team discovered an even better
safeguard while trying to fool the system.
"If you wrote a good essay and scrambled the words you would get a
good grade," Landauer said. "But try to get the good words without
writing a good essay!
"We've tried to write bad essays and get good grades and we can sometimes
do it if we know the material really well. The easiest way to cheat this
system is to study hard, know the material and write a good essay."
Thomas K. Landauer & Darrell Laham
Photo by Ken Abbott, CU Office of Public Relations