August 12, 2006,
Motivation: Evaluation
of Machine Translation (MT) has proven elusive throughout the entire history of
computing. This problem arises in
part from the inherent difficulty in assessing human language translation in general
and in part from the different and often competing evaluation goals of the MT
stakeholders. In the last 5 years,
the advent of rapidly applied, automatic evaluation measures have shown promise
in creating the expectation of a common benchmark for all MT approaches. But the coverage, consistently, and
reliability of those measures remain problematic for much of the MT
community. This workshop will
invite its participants to engage in a series of activities intended to widen
horizons, challenge deeply held views, and engage in questions that may appear
wildly out of scope of what has heretofore been the bounds of the
discourse. The goal is to increase
awareness of the issues and difficulties, confront the presumptions that may
have implicitly hindered us so far, and motivate us to come up with new
evaluation approaches that are more actionable, more accurate, and more
informative.
Who Should Attend: All MT users, procurers,
researchers, developers, investors, and anyone else who must make decisions
about some aspect of machine translation approaches, implementation, or lifecycle. We
welcome persons new to the issues of MT evaluation, as well as those who have
experience in designing and conducting evaluations.
Participation and Format: This workshop will include invited speakers, presentation of brief
position papers by participants, and hands-on exercises in evaluation, all
intended to reveal and highlight extreme positions. All participants will be expected to
participate in the exercises, which will probe some of the inherent difficulty
in evaluation, while comparing contemporary methods over the same translation
corpora. We encourage each
participant, as well, to prepare a position statement on evaluation issues such
as those listed below, in the form of a brief PowerPoint presentation (3-4
slides). We will allow presentation of these, time permitting, and will make
them available to the participants online.
We ask participants to address particular problems in evaluation, but in
particular, we would like the participants to engage one of the following
questions:
Organizers:
John White (Systran Software)
Keith Miller (MITRE
Corporation)
Advisory Committee:
Flo Reeder (MITRE)
Michelle Vanni
(Army Research Laboratory)
Kathi Taylor (US Government)
Elaine Marsh (Naval
Invited Speakers and\or
Panelists:
The Workshop is pleased to offer a point-counterpoint
by two associates of the MITRE Corporation, Florence Reeder and John
Henderson. Each has a unique and
contrasting perspective on the efficacy of linguistic judgment-based evaluation
and empirical evaluation. Their
presentations will demonstrate the divergence of opinion on MT evaluation
techniques, providing ammunition for the events to follow in the workshop.
Schedule:
|
08:30 每 08:45 |
Welcome and Overview |
|
08:45 每 09:45 |
Point and Counterpoint 每
Linguistics versus Statistics |
|
09:45 每 10:00 |
Hands-on Task 每 Description
and Data |
|
10:00 每 10:15 |
Break |
|
10:15 每 11:30 |
Presentation of Position
Papers |
|
11:30 每 12:00 |
Hands-on Exercise |
|
12:00 每 13:30 |
Lunch |
|
13:30 每 14:30 |
Hands-on Exercise continued |
|
14:30 每 15:00 |
Team Compilation and
Analysis |
|
15:00 每 15:15 |
Break |
|
15:15 每 16:45 |
Exercise read outs,
synthesis |
|
16:45 每 17:00 |
Conclusions: Future work |