Information processing
“Computer, summarize this
text!”
The human brain is no longer needed to summarize
scholarly articles. A computer program can handle the job.
Before our journalist’s eyes, a 15-page scientific document
was abridged in a fraction of second. The result (coherent,
precise and grammatically correct) took up 12 lines.
The software, still in its experimental stage, is a group
project by students in the Applied Computer Linguistics Research
Laboratory at University of Montréal, under the supervision
of professor Guy Lapalme, a professor in the Department of
Information Processing and Operational Research. After the
SumUM package, which produced 10- to 15-line summaries of
scientific articles, Atefeh Farzindar started looking at
jurisprudence texts—not as simple a job, but one that
yields astonishing results. “Currently,” she
says, “we are only working on documents in English,
but nothing prevents us from branching into other languages.”
Of course computers don't understand the meaning of the
words. As a result, the researchers must adopt various strategies
to “teach” them to write summaries. One option
is to analyze the work of real flesh-and-blood editors. Where
do they pick up information when they summarize a text? In
general, they look at the introduction, the conclusion, the
titles, the captions and beginnings of paragraphs. The computer
has to take basically the same approach.
Applied to computers, this method can reduce the quantity
of text to analyze. The computer does statistical calculations
that establish any abnormal frequencies of some words, determines
whether particular words are always associated with other
words or if some words appear to be keywords. The software
saves these significant expressions and then casts them in
correct language, reproducing them in a predetermined format.
Farzindar Atefeh is working in conjunction with the Centre
for Research into Public Law in the Faculty of Law, which
is providing him with a large quantity of digital documents.
While the software is already partly functional, what information
absolutely has to appear in the summary remains to be determined.
Algorithms also have to be developed that will enable the
computer to differentiate between expressions like “telephone
call” and “call to the bar.”
|