LTB ver. 2.46 Update 1

Lionbridge Linguistic ToolBox ver. 2.46 Update 1 - Release Notes

May 24^th, 2012

Contents

Changes

Defects fixed

Support Files Updates

Core Configuration

Term and Punctuation Rules

Entity Extractor Rules

Documentation

Known Issues

We are pleased to announce a new update of Linguistic ToolBox ver. 2.46 Update 1. In this release, we have fixed two defects related to the Segment Checker component and we have also included new and updated support files. Special attention should be paid to the new core and standard rules for Polish and Slovak.

If you have any technical questions or in case you identify any bugs, please contact us.

Changes

· The following improvements to the parser that were implemented in ver. 2.46 have been reverted due to an unexpected behaviour.

o Situations where there is more than one tag within a <ph> group.

o Situations where there is more than one tag within an internal tag in .rtf files.

o The pattern <ph><tag>text1</tag><tag2>text2</tag2></ph> is now interpreted as a whole and protected as ^TAG^.

· General Information tab has been updated to reflect the latest changes in the LTB software options.

Defects fixed

· Issues related to inconsistencies only due to changes in the word order among the segments not being reported. This type of issues are now reported under the Issue column as: “Inconsistent translation: Different word order or spacing.”

· Issue related to the “Ignore list” for the Check if source equals target not being correctly saved within a configuration file.

Support Files Updates

Core Configuration

· Updated: The file LTB_Core_Config.zip has been updated with these changes:

· Term and Punctuation:

o New Core PL STANDARD Number Format [UL] EN-PL.rul (new)

o New Core EN STANDARD Number Format 1,000.0 to 1 000,0 [UL].rul (updated)

o Core ALL STANDARD URLs [UL].rul (updated)

o Core ALL STANDARD Entity References.rul (updated)

Term and Punctuation Rules

· New rules (8):

o New Core PL STANDARD Number Format [UL] EN-PL.rul

New core rule for number checks. Checks that numeric values are present with the correct format (with thousands or decimal separators) in the target segment and that any numeric value present in the target segment is also present in the source. This rule has been updated to carry out more exhaustive checks and to include language specific information to make the rules more accurate and to reduce false positives.

o NN STANDARD Common English words left in Target EN-NN.rul

Detects instances of common words in English that have been left untranslated in the target.

o PL CUSTOM Microsoft Style Guide EN-PL.rul

Checks for compliance with the Microsoft Style Guide for Polish.

o PL STANDARD Common Errors ALL-PL.rul

Detects instances of common stylistic mistakes for Polish.

o PL STANDARD Negative Affirmative EN-PL.rul

Detect instances where affirmative sentences may have been translated as negative and vice versa.

o PL STANDARD Spelling Errors ALL-PL.rul

Checks for common typos and spelling mistakes in Spanish.

o SK STANDARD Country Names EN-SK.rul

Checks that country names have been correctly translated from English into Slovak.

o SK STANDARD False Friends EN-SK .rul

Detects instances of common false friends.

· Updated rules (8):

o Core ALL STANDARD Entity References.rul

Updated to improve the matching.

o Core ALL STANDARD URLs [UL].rul

Updated to fix an issue reported by an user when the rule stopped processing certain segments.

o New Core EN STANDARD Number Format 1,000.0 to 1 000,0 [UL]

Updated to remove Polish from the list of target languages.

o ALL STANDARD Strict Punctuation.rul

Updated to fix an issue related to multiline mode.

o FR STANDARD Grammar ALL-FR.rul

o FR-CA STANDARD False Friends EN-FRCA. Rul

Updated with new entries and modified to avoid false positives.

o NB STANDARD Common English words left in Target EN-NB.rul

Updated to remove some false positives.

o ZH-TW STANDARD Common Typos ALL-ZHTW. Rul

Entity Extractor Rules

· New rules (6):

o ALL STANDARD Duplicated Words [UL].rul

Checks for duplicated words in the source. Use DupWordsPhrases_Blockers.txt list to define exceptions. (NOTE: Entries in the list file are CASE-SENSITIVE; be sure to enter any common case variants needed.) Use this rule in the Term and Punctuation component to take advantage of the User lists functionality.

o ALL STANDARD CaMel-Alphanumeric [UL].rul

Checks for instances of "CaMel" words and alphanumeric expressions in the source. Uses external lists Camel_Alphanumeric_Blockers.txt and ordinal_suffixes.txt to block matches in source segments. Use this rule in the Term and Punctuation component to take advantage of the User lists functionality.

o ALL STANDARD Strict - Acronyms [UL].rul

Checks for instances of acronyms in the source. Use source_strict_acronym_blockers.txt and target_strict_acronym_blockers.txt to prevent matches. Use this rule in the Term and Punctuation component to take advantage of the User lists functionality.

o ALL STANDARD URLs [UL].rul

Checks for instances of any URL or e-mail address in the source segments. External lists define Exceptions: 1. Use Whole_String_SOURCE_URL_Blockers.txt to specify precisely full item matched by the LTB rule that should be excluded. 2. Use Partial_String_SOURCE_URL_Blockers.txt to specify any portion of a URL/address that should be blocked. This blocks ANY partial match so use with CARE! Use this rule in the Term and Punctuation component to take advantage of the User lists functionality.

o DE STANDARD Segmentation Checker.rul

Checks for the most common type of segmentations issues in files prepared for translation: articles found at the end of the segment, instances of me dash that may suggest a sub segmentation is required, instances of letters in initial uppercase not at the beginning of the sentence and possible sentence fragment (begins with lowercase, ends with period).

o ES STANDARD Segmentation Checker.rul

Checks for the most common type of segmentations issues in files prepared for translation: articles found at the end of the segment, instances of letters in initial uppercase not at the beginning of the sentence and possible sentence fragment (begins with lowercase, ends with period).

· Deleted rules (5):

o ALL STANDARD Acronyms.rul (this rule has now been replaced with ALL STANDARD CaMel-Alphanumerics [UL].rul)

o ALL STANDARD CamelWords.rul (this rule has now been replaced with ALL STANDARD Strict - Acronyms [UL].rul)

o ALL STANDARD E-Mail Addresses (this rule has now been replaced with ALL STANDARD URLs [UL].rul which also includes functionality to detect e-mail addresses)

o ALL STANDARD Monolingual Duplicated Words.rul (this rule has now been replaced with ALL STANDARD Duplicated Words [UL].rul)

o ALL STANDARD URLs (this rule has now been replaced with ALL STANDARD URLs [UL].rul)

Documentation

· Help file has not been updated with changes related to this release.

· Information about the functionality and usage of the rules installed with the Term and Punctuation has not been updated with changes related to this release: http://autoupdate.lionbridge.com/ltb/_LTB_Installation_Rules_Description.html

· Previous Release Notes are available at: http://autoupdate.lionbridge.com/ltb/PreviousReleaseNotes.htm

Known Issues

http://autoupdate.lionbridge.com/ltb/KnownIssues.htm