This is one of the 52 terms in The Language of Localization published by XML Press in 2017 and the contributor for this term is Dave Ruane.

What is it?

A defined list of grouped symbols used for digital communication.

Why is it important?

All global text belongs to a particular character set. Digital programs and platforms expect a specific character set so that they correctly process, render, and visualize each character of the text.

Why does a business professional need to know this?

In its simplest form, a character set is a mapping (table) between text characters and the binary numbers that a computer or other digital device understands. For example, the 3 letters “A, B, C” are read as “01000001, 01000010, 01000011” by a computer using the ASCII character set (one of the early character sets).

As the need for global software arose in the 1980s and 1990s, computer scientists devised digital character sets that could manage character complexity and the thousands of characters in languages such as Chinese. Some character sets assigned a single byte to characters and others used double or multiple bytes for each character. Vendor- and platform-specific character sets also became common and created situations where similar character sets had different values for the same character, which meant that characters would be rendered incorrectly if processed using the mapping for the wrong character set.

If an application supports a specific character set, the user’s device needs to recognize and support the same character set, as part of the due diligence for publishing globally.

For this reason, software localization and development engineers must understand character sets. Issues with character sets can be the bane of their lives, especially when character corruption occurs – for example, when translated software strings are moved across platforms that support different character sets or character encodings (e.g. from UNIX to Windows).

Today, more harmonization exists in this area with the proliferation of Unicode (which assigns a unique number to every character in nearly every language) and its various character encodings. A character set can have multiple character encodings, but each encoding can relate to only one character set.