CHINSIMP.TXT

(198 KB) Pobierz
##Adobe File Version: 1.000
#=======================================================================
#   FTP file name:  CHINSIMP.TXT
#
#   Contents:       Map (external version) from Mac OS Chinese
#                   Simplified encoding to Unicode 2.1
#
#   Copyright:      (c) 1996-1999 by Apple Computer, Inc., all rights
#                   reserved.
#
#   Contact:        charsets@apple.com
#
#   Changes:
#
#       b02  1999-Sep-22    Update contact e-mail address. Matches
#                           internal utom<b1>, ufrm<b3>, and Text
#                           Encoding Converter version 1.5.
#       n08  1998-Feb-05    Just rewrite initial header comments and
#                           reorder into single list with all one-byte
#                           characters at the beginning; no mapping
#                           changes. Matches internal utom<n7>, ufrm<n8>
#                           and Text Encoding Converter version 1.3.
#       n05  1996-Aug-22    Matches internal ufrm<n1>.
#       n00  1996-Aug-01
#
# Standard header:
# ----------------
#
#   Apple, the Apple logo, and Macintosh are trademarks of Apple
#   Computer, Inc., registered in the United States and other countries.
#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
#   throughout this document, "Macintosh" can be used to refer to
#   Macintosh computers and "Unicode" can be used to refer to the
#   Unicode standard.
#
#   Apple makes no warranty or representation, either express or
#   implied, with respect to these tables, their quality, accuracy, or
#   fitness for a particular purpose. In no event will Apple be liable
#   for direct, indirect, special, incidental, or consequential damages 
#   resulting from any defect or inaccuracy in this document or the
#   accompanying tables.
#
#   These mapping tables and character lists are subject to change.
#   The latest tables should be available from the following:
#
#   <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
#   <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/>
#
#   For general information about Mac OS encodings and these mapping
#   tables, see the file "README.TXT".
#
# Format:
# -------
#
#   Three tab-separated columns;
#   '#' begins a comment which continues to the end of the line.
#     Column #1 is the Mac OS Chinese Simplified code (in hex as 0xNN
#       or 0xNNNN)
#     Column #2 is the corresponding Unicode or Unicode sequence (in
#       hex as 0xNNNN or 0xNNNN+0xNNNN). Sequences of up to 2
#       Unicode characters are used here.
#     Column #3 is a comment containing the Unicode name.
#       In some cases an additional comment follows the Unicode name.
#
#   The entries are in Mac OS Chinese Simplified code order. All
#   one-byte characters are at the beginning.
#
#   Some of these mappings require the use of corporate characters.
#   See the file "CORPCHAR.TXT" and notes below.
#
#   Control character mappings are not shown in this table, following
#   the conventions of the standard UTC mapping tables. However, the
#   Mac OS Chinese Simplified encoding uses the standard control
#   characters at 0x00-0x1F and 0x7F.
#
# Notes on Mac OS Chinese Simplified:
# -----------------------------------
#
#   This table covers the Mac OS Chinese Simplified encoding used in
#   Mac OS versions 7.1 and later, including the Chinese Language Kit.
#   The Mac OS Chinese Simplified encoding is based on EUC-CN, but it
#   changes the high-byte range and adds a few characters.
#
#   For Mac OS Chinese Simplified, two-byte characters have
#   first/lead/high byte in the range 0xA1-0xFC, and second/trail/low
#   byte in the range 0xA1-0xFE.
#
# 1. Standard EUC-CN
#
#    This includes one-byte characters, which are usually the ASCII set. In
#    addition, it includes two-byte characters with both bytes in the range
#    0xA1-0xFE. The two-byte characters are from GB 2312-80, but their code
#    points are transformed from the GB 2312 range 0x2121-0xFEFE by adding
#    0x8080.
#
#    EUC-CN includes the following ranges:
#    - 0xA1A1-0xA9EF, various punctuation, symbol, number, separator, and
#    letter characters. Not all the code points in this range are defined
#    by GB 2312.
#    - 0xB0A1-0xF7FE, "ideographic" characters (Hanzi).
#
# 2. Mac OS Chinese Simplified additions
#
#    a)  Two-byte changes and additions
#
#      Mac OS Chinese Simplified shortens the high-byte range so the
#      first/lead/high bytes of two-byte characters are limited to
#      the range 0xA1-0xFC.
#
#      The additions use code points that are undefined in EUC-CN, and fall
#      into two groups. Both are actually standard extensions to GB 2312.
#      - 0xA6D9-0xA6F5, forms for vertical text
#      - 0xA8BB-0xA8C0, pinyin extensions for Cantonese, etc. These are
#        from the GB 6345.1-1986 extension to GB 2312-80.
#
#    a)  One-byte additions
#
#        0x80  LATIN SMALL LETTER U WITH DIAERESIS, alternate
#        0x81  height-metric character (see below)
#        0x82  width-metric character (see below)
#        0xA0  NO-BREAK SPACE
#        0xFC  COPYRIGHT SIGN
#        0xFD  TRADE MARK SIGN
#        0xFF  HORIZONTAL ELLIPSIS
#
#      The two characters at 0x81 and 0x82 are somewhat special. These
#      are one-byte characters whose glyphs have the same metrics as the
#      glyphs for the two-byte characters. This way application developers
#      can use QuickDraw functions such as CharWidth to determine the
#      metrics of the two-byte character glyphs in a particular font.
#       0x81  a character whose glyph has the height of a two-byte
#             character glyph.
#       0x82  a character whose glyph has the advance width of a two-
#             byte character glyph. Note: For old-style (FBIT/FDEF)
#             bitmap fonts, the width of this glyph is *half* the width
#             of the two-byte character glyphs.
#
# Unicode mapping issues and notes:
# ---------------------------------
#
# 1. Mapping the the Apple additions
#
#    The goals in the mappings provided here are:
#    - Ensure roundtrip mapping from every character in the Mac OS Chinese
#    Simplified encoding to Unicode and back
#    - Use standard Unicode characters as much as possible, to maximize
#    interchangeability of the resulting Unicode text. Whenever possible,
#    avoid having content carried by private-use characters.
#
#    Some of the characters in the Mac OS Chinese Simplified Apple additions
#    do not correspond to distinct, single Unicode characters. To map these
#    and satisfy both goals above, we employ various strategies.
#
#    a)  Map a single Mac OS Chinese Simplified character to a sequence of
#    standard Unicode characters
#
#    For example, the character 0xA8BF in the Apple additions is a
#    small n with grave accent. There is currently no single Unicode
#    character for this. However, it can be mapped to 0x006E+0x0300,
#    LATIN SMALL LETTER N + COMBINING GRAVE ACCENT.
#
#    b)  Use private use characters in combination with standard Unicode
#    character to mark variants of the standard Unicode character.
#
#    Apple has defined a block of 32 corporate characters as "transcoding
#    hints." These are used in combination with standard Unicode characters
#    to force them to be treated in a special way for mapping to other
#    encodings; they have no other effect. Sixteen of these transcoding
#    hints are "grouping hints" - they indicate that the next 2-4 Unicode
#    characters should be treated as a single entity for transcoding. The
#    other sixteen transcoding hints are "variant tags" - they are like
#    combining characters, and can follow a standard Unicode (or a sequence
#    consisting of a base character and other combining characters) to
#    cause it to be treated in a special way for transcoding. These always
#    terminate a combining-character sequence.
#
#    The transcoding coding hints used in this mapping table are two
#    variant tags, 0xF87E and 0xF87F. Since these are combined with
#    standard Unicode characters, some characters in Mac OS Chinese
#    Simplified encoding map to a sequence of two Unicodes instead of a
#    single Unicode character. For example, the Mac OS Chinese Simplified
#    character at 0xA6D9 is a vertical-text form of the FULLWIDTH COMMA
#    (the standard mapping is for the horizontal form at 0xA3AC). So 0xA6D9
#    is mapped to 0xFF0C (FULLWIDTH COMMA) + 0xF87E (a variant tag).
#
#    c)  Use private use characters by themselves to map characters which
#    have no relationship to any standard Unicode character.
#
#    We define two corporate-zone characters for this purpose:
#
#      0xF880  height-metric character for double-byte fonts
#      0xF881  width-metric character for double-byte fonts
#
# 2. Mapping the basic EUC-CN characters
#
#    The mappings for GB 2312-1980 are based on the GB 2312 mapping table
#    provided by the Unicode Consortium (UTC), dated 6 December 1993,
#    which was created by Glenn Adams and John Jenkins. That table is
#    Copyright 1991-1993 by Unicode, Inc.
#
#    Some of the non-Hanzi mappings were changed from the UTC mappings.
#    There were three reasons for this:
#    - To be more consistent with be the GBK mappings from the China
#    standards organization (GBK is supposed to include all of GB 2312).
#    - If the UTC table mapped the GB character to a "fullwidth" version
#    but there was no mapping to the "basic" version, then the mapping was
#    changed to the "basic" version. This is more consistent with the ...
Zgłoś jeśli naruszono regulamin