Abstract
Increased use of computer-assisted translation (CAT) technology in business settings with augmented amounts of tasks, collaborative work, and short deadlines give rise to errors and the need for quality assurance (QA). The research has three operational aims: 1) methodological framework for QA analysis, 2) comparative evaluation of four QA tools, 3) to justify introduction of QA into CAT process. The research includes building of translation memory, terminology extraction, and creation of terminology base. Error categorization is conducted by multidimensional quality (MQM) framework. The level of mistake is calculated considering detected, false, and not detected errors. Weights are assigned to errors (minor, major, or critical), penalties are calculated, and quality estimation for translation memory is given. Results show that process is prone to errors due to differences in error detection, harmonization, and error counting. Data analysis of detected errors leads to further data-driven decisions related to the quality of output results and improved efficacy of translation business process.
TopIntroduction
Use of computer-assisted translation (CAT) and machine translation technology is increasingly used in various business environments, such as in multilingual companies, societies, in industry, entertainment, and educational institutions or international events. Language service providers (LSPs) that offer translation services face competitive markets and digital transformation.
Digital transformation of the translation process has introduced various changes related to the whole business translation process, such as the use of CAT and machine translation technology, new jobs, task distribution, collaborative work, creating and sharing of digital resources and education of employees. In this competitive business environment, the use of CAT technology, used separately or integrated with machine translation, has gained considerable importance.
An increasing amount of work, short deadlines and collaboration in the same project give rise to an augmented number of errors in the translation process using CAT technology. Human verification of errors would be an extremely tedious, time-consuming and subjective task, inclined to errors. For this reason, the use of quality assurance (QA) tools helps to detect errors and possibly enable the categorization and counting of errors, can considerably contribute to the analysis of errors. Data analysis of error types could lead to further relevant decisions related to the translation business process, such as the building of language resources (e.g. lists of forbidden or preferred terms or list of abbreviations for translation), check of style formats depending on the language (e.g. number, currency and time formats), setup of QA tool profiles and creation of regular expressions detecting specific language errors, but also to the reorganization of business processes (e.g. introducing new jobs, such as language engineer and project manager, redistribution of tasks or segmentation of complex tasks).
In a business environment, data analysis has become an indispensable segment of any business process related to quality issues. This research can be used to improve the quality of the output product (here translation), identify weak points, and reorganize the business translation process. Data analysis of errors obtained by QA tools can serve to improve the efficiency of the CAT process and the quality of its output.
QA tools enable not only verification of target text, but also verification of the translation memories consisting of pairs of source and target segments, as well as compliance with client demands. Some QA tools can provide an additional asset, enabling language setup, control of content, control of layout, control of orthography and cultural formatting differences or check of terminology compliance with end-user demands. QA tools can provide significant help, but on the other hand, they differ regarding their setup characteristics, types of detected errors, ways of counting errors and integration possibilities. For this reason, the main goal of this research is to present the role of QA tools in error analysis, which can serve as a source for data analysis of errors performed in the CAT process and lead to future decisions. Existing researches of QA in CAT technology are mainly oriented to production level, concentrating on speed, analysis of error types, mistranslations, or post-editing, while researches related to QA of translation memory including source and target segments are scarce. As a translation memory represents the foundation for building high-quality resources used in CAT and machine translation technology, QA of translation memories directly affects the quality of the translated text.
The general aim of this research is to present the role of QA tools and data analysis of errors performed in the CAT process. The paper has three aims: i) to present a methodological framework for quality estimation of the translation memory which represents the basis for CAT technology ii) to perform a comparative evaluation of four QA tools with the aim of the harmonization of error types iii) to justify the introduction of QA tools into the digitally transformed process of translation, supported by CAT technology.
Key Terms in this Chapter
Multidimensional Quality Metric (MQM) Framework: Provides a list of over 100 error types, classified into main categories (adequacy, fluency, verity, terminology, locale convention, design, internationalization) and subsequent subcategories, to harmonize error types.
Quality Assurance (QA): The process which identifies differences in translation between two languages, which can be differences in terminology, use of forbidden terms, layout differences, cultural differences (writing numbers, time, currency), signs, names, segments which are not translated (URL addresses), names, etc. in order to provide the high-quality translation.
Translation Memory: This is a textual database containing parallel segments (sentences, clauses, phrases, terminology, numbers) in source and target languages. Translation memory is the basis for building resources in computer-assisted translation tools and for building machine translation systems.
Alignment: The process of creating segment pairs in source and target languages used to create translation memory as a fundamental resource for computer-assisted translation and machine translation.
Error Type: Refers to various types of errors (here in machine translation/computer-assisted translation process), basically according to the MQM framework.
Integrated Translation Software: Assumes computer software that integrates computer-assisted translation, terminology base and machine translation, possibly with other technologies, such as speech technology, optical character recognition (OCR), summarization, etc.
Computer-Assisted Translation (CAT): Implementation of interactive computer software in the translation process, which enables retrieving of already existing similar sentences from the translation memory when translating the new document. It includes the use of translation memory, terminology base and alignment modules, but can become integrated translation software.
Machine Translation: Use of computer software/programs when translating from one natural language into another, performed by automatic machine translation or by integrated machine translation, which can include various technologies (e.g. automatic machine translation integrated with speech technologies, computer-assisted translation, optical character recognition (OCR), etc.)