Unicode: Difference between revisions
CSV import Tags: mobile edit mobile web edit |
No edit summary |
||
| Line 1: | Line 1: | ||
[[file:Unicode_sample.png|thumb|left]] [[file:Hiero_O4.png|thumb|left]] [[file:Cyrillic_cursive.svg|thumb|right]] [[file:I_acute_-_soft_dotted_and_Lithuanian_dot.svg|thumb|right]] {{Short description|Computing industry standard for consistent encoding, representation, and handling of text}} | [[file:Unicode_sample.png|thumb|left]] [[file:Hiero_O4.png|thumb|left]] [[file:Cyrillic_cursive.svg|thumb|right]] [[file:I_acute_-_soft_dotted_and_Lithuanian_dot.svg|thumb|right]] {{Short description|Computing industry standard for consistent encoding, representation, and handling of text}} | ||
'''Unicode''' is a [[computing industry standard]] designed to ensure that text and symbols from all the world's writing systems are consistently encoded, represented, and handled by computers. The standard is maintained by the [[Unicode Consortium]], a non-profit organization. | '''Unicode''' is a [[computing industry standard]] designed to ensure that text and symbols from all the world's writing systems are consistently encoded, represented, and handled by computers. The standard is maintained by the [[Unicode Consortium]], a non-profit organization. | ||
Latest revision as of 22:10, 5 January 2025




Computing industry standard for consistent encoding, representation, and handling of text
Unicode is a computing industry standard designed to ensure that text and symbols from all the world's writing systems are consistently encoded, represented, and handled by computers. The standard is maintained by the Unicode Consortium, a non-profit organization.
History[edit]
The development of Unicode began in 1987, with the first version of the Unicode Standard being published in 1991. The goal was to address the limitations of earlier character encoding systems, such as ASCII and various national and vendor-specific encodings, which were insufficient for representing the wide array of characters used in global languages.
Design Principles[edit]
Unicode is based on several key design principles:
- **Universal Character Set**: Unicode aims to include every character used in writing systems across the world.
- **Efficiency**: Unicode is designed to be efficient in terms of storage and processing.
- **Unification**: Similar characters from different writing systems are unified into a single code point where possible.
Encoding Forms[edit]
Unicode can be implemented in different encoding forms:
- UTF-8: A variable-width encoding that uses one to four bytes for each character.
- UTF-16: A variable-width encoding that uses two or four bytes for each character.
- UTF-32: A fixed-width encoding that uses four bytes for each character.
Character Properties[edit]
Each Unicode character has a set of properties that define its behavior in text processing. These properties include:
- **General Category**: Defines the character type (e.g., letter, digit, punctuation).
- **Combining Class**: Used for characters that combine with others, such as diacritics.
- **Bidirectional Class**: Determines how characters are displayed in bidirectional text.
Unicode Blocks[edit]
Unicode characters are grouped into blocks based on their script or usage. Examples include:
- Basic Latin: U+0000 to U+007F
- Cyrillic: U+0400 to U+04FF
- CJK Unified Ideographs: U+4E00 to U+9FFF
Applications[edit]
Unicode is widely used in various applications, including:
- Operating systems: Most modern operating systems support Unicode.
- Web browsers: Unicode is essential for displaying web pages in multiple languages.
- Programming languages: Many programming languages, such as Python and Java, support Unicode.
Unicode Consortium[edit]
The Unicode Consortium is responsible for the development and maintenance of the Unicode Standard. It collaborates with other standards organizations, such as the International Organization for Standardization (ISO), to ensure compatibility and interoperability.
See Also[edit]
References[edit]
<references group="" responsive="1"></references>