« Go to homepage

GESTION OF AUTOMATIC TRANSLATION IN DISCUSSION LIST SERVICES

FUNREDES
http://funredes.org
April 2005

INTRODUCTION

Funredes has experimented with the use of automatic translation program to help multilinguistic communication within virtual spaces as early as 1998 [1] and has formulated its first version of EMEC project in March 1997.

The EMEC project (http://funredes.org/emec) represent the top services which has been conceived and experimented for the management of multilingual virtual space. The EMEC service includes the management of signal over noise and a high level management of automatic translation which requires a serious investment which is proportional to the number and size of messages in discussion list [2].

While the first stage of the MISTICA project (http://funredes.org/mistica) has been a wonderful opportunity to test and evaluate the methodology with a positive outcome, it has not been possible to receive additional support to formulate and experiment with an enhanced EMEC2 methodology nor receive request for such services.

However, the need for a basic level of support of multilinguism in electronic conference has been growing steadily although the conditions are not yet mature for the acknowledgment of the investment that deserve to be made for such services.

In that condition, the demand for the simplest manners to manage automatic translation (pure automatic translation without added value) deserved consideration and Funredes has been naturally conducted to develop a programmed support for the systematization of the process of embedding automated translation outputs to mailing list, being the lack of affordable Linux based translation software a bottleneck.

Funredes took the decision, in 2003, to program the tedious process of creating the translated message with cut and pace from the output of the translation program into the list server manager and thus avoiding dependency on human intervention for the process (except obviously for the monitoring and the maintenance).

The experience has shown that even for the simplest management of automatic translation different options exist which could make the process more or less friendly and in turn more or less costly to develop and maintain. This document aims at giving an idea of the different options and their corresponding cost. Another document will present the internals of the service.

FIRST A WARNING

The word automatic translation is misleading. If people are expecting a real translation service there are prone to receive a lot of frustrations and disappointment. What is all about is a rough aid to mutual understanding. The concept is that reading the output of the translation software AND the original text it is possible to enhance the level of understanding. As a matter of fact, if people would follow the guideline for expression (see http://www.funredes.org/mistica/english/emec/method_emec/presentation.html) and write in clear, short, unambiguous sentences, avoiding grammatical and spelling mistakes, passive expressions and idiomatic style, most people would be amazed by the quality of the output. However, the daily experience is that most people tend not to follow the rules and furthermore the netiquette is not a growing value which makes questionable the use of this service in non moderated environments where the low signal over noise is still amplified by the service.

PROGRESSION OF SERVICE - PROGRESSION OF COST

Note: all costs mentioned hereafter are only the best evaluation of Funredes up to date based on its own salary and administrative scales. This is neither the result of a pricing study nor a guaranteed data.

There are different elements in the cost equations:

-  the software development cost;

-  the maintenance cost which includes fixing the software and maintaining the unstable interface with web translation services;

-  the operational cost which consist on human resources which action is required to have the services be performed.

The reason why it is so difficult to address publicly the issue of translation services is that there a wide range of possibilities associated with a wide range of cost, ranging from 1 to 1000! This represents almost a continuum of possibilities and it is hard to express it as a simple choice in a menu. Furthermore, there are many possible options for the level of service which in turn add to the cost equation complexity. So far, Funredes has opted for a formula trying to return the initial development investment and to cover its maintenance and operational cost. Within this approach the services are billed in a yearly manner by project on the basis of 1000, 2000 or 5000 US$ depending on the type of services.

The following tables try to show the range of possibilities.

FUNCTION DESCRIPTION
Top level professional translation In this scenario, a high level professional group translates each message in each of the selected languages. Obviously attached documents are not included.

Professional translation Same scenario with professionals asking for a lower price per page.
EMEC service See http://funredes.org/emec for details
Send software translation products in parallel lists organized by language Each users decides in what language(s) he/she wants to receive the contributions. The message will hold the original contribution and the translation.
Send products in a unique parallel list with concatenated products Subscribers who want translation subscribe to a special list which hold the original message and the sequence of all translations in each of the processed languages. The others just receive the original.
Send products in a unique list with concatenated products In that scenario, everybody has to read the original followed by the translations.
Store products in a memory web pages per language A web page is associated to each language and hold the sequence of message, with one hit the message is displayed. We usually use the Hypermail standard [3] .
Store concatenated product in a memory web page The same but a unique web page with all the messages by language in sequence.

TYPE OF COSTS


MESSAGE When the cost is associated with the message and its length, like in human translation or added value services such as for output revision.
DEVELOPMENT The cost of programming the process of integration of the list management with the software translation.
MAINTENANCE The cost of maintaining the program, fixing bugs and adapting to the change of an unstable interface in the web. Practically independent of the type of service and low dependence on the number of managed services.
OPERATIONAL The cost of managing the service like subscription management, monitoring of the quality of service, user support... Linear dependence to the number of services.

FUNCTION MESSAGE COST DEVELOP. COST MAINTEN. COST OPERAT. COST

High Level professional translation 100US$ per page per language 0 0 500 US$ per month
Professional translation 25 US$ per page per language 0 0 500 US$ per month
Non Professional translation 5 US$ per page per language 500 US$ per month
EMEC 2.5 US$ per page per language 10,000 US$ 1000 US$ per month 5000 US$ per month
Send products in parallel lists organized by language 0 3000 US$ 200 US$ per month 500 US$ per month
Send product in a unique parallel list with concatenated products 0 2000 US$ 200 US$ per month 200 US$ per month
Send product in a unique list with concatenated products 0 1000 US$ 200 US$ per month 200 US$ per month
Store product in a memory web pages per language 0 1000 US$ (not done) 200 US$ per month 100 US$ per month
Store concatenated product in a memory web page 0 200 US$ (not done) 200 US$ per month 50 US$ per month

The two extreme positions of the table shows that a single message translation cost for one single language can varies from 100 US$ to 0.10 US$...

The two extreme positions of the table shows that a single message translation cost for one single language can varies from 100 US$ to 0.10 US$...

SUPPORTED LANGUAGES

So far Funredes has experimented with English, French, Portuguese and Spanish. We have been looking for software for Haitian Creole and have been in close discussion with Atamiri [4], am original concept developed by Bolivian Ivan Guzman de Rojas which allow direct translation without pivoting thru English to allow some Indigenous language translation and a Unix based program.

We have been experimenting with GlobalLink [5], in the framework of an agreement with FPH [6] and with the BabelFish [7] web interface.

The best solution, still to be found, would be to have a Unix based affordable translation software avoiding to interface the web. Atamiri is so far the best candidate but some terminological development are still required for many supported languages.

ADDITIONAL OPTIONS

Adding value with output revision (strongly recommended): implies a cost per message of the order of 0.33 US$ per language. This means that a person reads each message and fixes the most visible mistakes of the translation software.

Adding value with input revision: implies a cost per message of the order of 1 US$ per language. Means that a person revises each message and rewrites it in a manner which will provide more reliable output from the translation program. It consists basically in applying the rules of writing already referenced.

Dehtmlization of the input.
In moderated lists where the bounce mechanism produce an unreadable message in case of using HTML this service will remove the HTML code so to allow a readable and processable message.

Process post or ante moderation.
Depending of the situation the manager of a given discussion list would prefer the process to be done after the moderation process or before. In the second case, the output revision is made possible by the moderator.

Management of external source list
If the original list is managed in a different server than Furedes’ and/or list manager [8], this could bring more difficulties to the process.

Management of external target list
If the output list is in a different server and/or list manager, this will definitively bring more difficulties to the process and requires some extra programming.

LIST OF LIST MANAGED BY FUNREDES WITH AUTOMATIC TRANSLATION

NAME TYPE COMMENT

MISTICA/EMEC EMEC 1999
MISTICA Parallel concatenated moderated, 4 languages
SALSA Unique concatenated moderated, 4 languages
CARDICIS Parallel concatenated moderated, 4 languages
CARDIS Parallel concatenated non moderated, 4 languages
BOHIO Unique concatenated moderated, 2 languages
EDM Parallel concatenated non moderated, 4 languages, external source list, internal output list
CIVIC To be defined External source list External output list
M3EDT Parallel one language Non moderated
I-Jumelage Several lists
WSIS Plenary Hypermail concatenated Non moderated

WSIS CIVIL SOCIETY PLENARY
http://mailman.greennet.org.uk/public/plenary/ is the address of the WSIS civil society Plenary discussion list. This is a quite active list with an average number of contribution higher than 5 and the possibility to increase above 15 in some specially dense days.
This is a non moderated place with an average to low level of discipline on netiquette (answers will generally leave the original message without any attempt to trim and adapt).
This place is by nature multicultural and there is a pending requirement to offer some level of supports to nultilinguism. Funredes is an active participant and has considers from the beginning to adapt its automatic translation services; however without covering the cost this operation is hardly possible for Funredes, a non profit organization with no institutional support which is only funded by project. Recently, Robert Guerra made a public call and Funredes put a bit more imagination and conceived a low cost solution which will avoid the management of any parallel list. The solution which is forecasted will consist in opening a hypermail with the sequence of translated mails in http://wsis.funredes.org/plenary. Obviously one hypermail per languages would be a more friendly solution but the associated cost is not sustainable by Funredes without external support.
The requirement from the WSIS Civil Society discussion list conduces Funredes to open free of charge for one year a new simpler and economic way to manage the automatic translation through a unique memory sequential archive. Meanwhile we will look for support to offer a better service, and in particular will open the issue in the next WSIS conference to be held in Bamako [9] where Funredes’s President is one of the speakers.

Footnotes :

[1] With an online conference about Caribbean culture called [email protected] which is still maintained in http://funredes.org/salsa.

[2] A figure of 10 US$ per message has been reached after stating at 20US$ and optimizing the process thanks to the inclusion of a supporting management platform with data basing based on PHP.

[3] http://www.hypermail.org/

[4] http://www.atamiri.cc/es/AtamiriSolution/

[5] http://www.globalinktranslations.com/

[6] http://www.fph.ch/

[7] http://babelfish.altavista.com/ which makes use of Systran translation software.

[8] Funredes uses Majordomo and has add some add-on functions to ease the administrative process. See http://www.greatcircle.com/majordomo/

[9] http://portal.unesco.org/ci/admin/ev.php?URL_ID=17688&URL_DO=DO_TOPIC&URL_SECTION=201