| A |
A character set containing
characters from around the world. |
It is a character set,
containing characters from around the world, that
was created by the Unicode Consortium, a group
consisting mainly of American private
corporations.
It is not a standard defined by a public body,
but roughly the same content is specified by the
ISO/IEC-10646-1 international standard, while the
domestic JIS standard equivalent to this is JIS X
0221.
Unicode is characterized by its expression of the
characters of the major world languages using 16
bits (see below). (A bit is the basic unit of
information, and can be either 0 or 1). For this
reason, it has no compatibility with traditional
character codes that use 7 or 8 bits for
describing characters.
Unicode is a character set that contains many
characters from around the world, but that does
not mean that by using a Unicode-based system
that it is possible to use all the world's
characters. Actually, a font containing all the
characters specified by Unicode does not exist
(without such a font, it is impossible to either
display or print the characters). In addition,
because of the different rules used for writing
(for Japanese, this includes end-of-line
processing, etc.), application software that is
capable of handling such rules is also necessary.
In reality, Unicode is utilized in order to
better handle a particular language's characters,
rather than use several different languages
simultaneously.
For example, in Japanese, the benefits of using
Unicode rather than Shift JIS are as follows.
Unicode has been
criticized for not allowing the simultaneous use
of Japanese and Chinese typefaces, but this is
not necessarily true. After unifying characters
that differ minimally in appearance based on
certain rules, in some cases Japanese characters
and Chinese characters were given the same
character code. However, location and language
information should be determined elsewhere,
rather than by character code, and if such
information is correctly provided, there is no
possibility of typefaces from different languages
being confused with each other.
Note: Some
material available states that Unicode uses 16
bits per character; however, this is an incorrect
statement. Just as JIS X 0208 expresses one
character using two pieces of 7 or 8 bit data,
Unicode may express one character as a
combination of several pieces of 16 bit data.
|