##Adobe File Version: 1.000 #======================================================================= # FTP file name: CHINSIMP.TXT # # Contents: Map (external version) from Mac OS Chinese # Simplified encoding to Unicode 2.1 # # Copyright: (c) 1996-1999 by Apple Computer, Inc., all rights # reserved. # # Contact: charsets@apple.com # # Changes: # # b02 1999-Sep-22 Update contact e-mail address. Matches # internal utom<b1>, ufrm<b3>, and Text # Encoding Converter version 1.5. # n08 1998-Feb-05 Just rewrite initial header comments and # reorder into single list with all one-byte # characters at the beginning; no mapping # changes. Matches internal utom<n7>, ufrm<n8> # and Text Encoding Converter version 1.3. # n05 1996-Aug-22 Matches internal ufrm<n1>. # n00 1996-Aug-01 # # Standard header: # ---------------- # # Apple, the Apple logo, and Macintosh are trademarks of Apple # Computer, Inc., registered in the United States and other countries. # Unicode is a trademark of Unicode Inc. For the sake of brevity, # throughout this document, "Macintosh" can be used to refer to # Macintosh computers and "Unicode" can be used to refer to the # Unicode standard. # # Apple makes no warranty or representation, either express or # implied, with respect to these tables, their quality, accuracy, or # fitness for a particular purpose. In no event will Apple be liable # for direct, indirect, special, incidental, or consequential damages # resulting from any defect or inaccuracy in this document or the # accompanying tables. # # These mapping tables and character lists are subject to change. # The latest tables should be available from the following: # # <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> # <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/> # # For general information about Mac OS encodings and these mapping # tables, see the file "README.TXT". # # Format: # ------- # # Three tab-separated columns; # '#' begins a comment which continues to the end of the line. # Column #1 is the Mac OS Chinese Simplified code (in hex as 0xNN # or 0xNNNN) # Column #2 is the corresponding Unicode or Unicode sequence (in # hex as 0xNNNN or 0xNNNN+0xNNNN). Sequences of up to 2 # Unicode characters are used here. # Column #3 is a comment containing the Unicode name. # In some cases an additional comment follows the Unicode name. # # The entries are in Mac OS Chinese Simplified code order. All # one-byte characters are at the beginning. # # Some of these mappings require the use of corporate characters. # See the file "CORPCHAR.TXT" and notes below. # # Control character mappings are not shown in this table, following # the conventions of the standard UTC mapping tables. However, the # Mac OS Chinese Simplified encoding uses the standard control # characters at 0x00-0x1F and 0x7F. # # Notes on Mac OS Chinese Simplified: # ----------------------------------- # # This table covers the Mac OS Chinese Simplified encoding used in # Mac OS versions 7.1 and later, including the Chinese Language Kit. # The Mac OS Chinese Simplified encoding is based on EUC-CN, but it # changes the high-byte range and adds a few characters. # # For Mac OS Chinese Simplified, two-byte characters have # first/lead/high byte in the range 0xA1-0xFC, and second/trail/low # byte in the range 0xA1-0xFE. # # 1. Standard EUC-CN # # This includes one-byte characters, which are usually the ASCII set. In # addition, it includes two-byte characters with both bytes in the range # 0xA1-0xFE. The two-byte characters are from GB 2312-80, but their code # points are transformed from the GB 2312 range 0x2121-0xFEFE by adding # 0x8080. # # EUC-CN includes the following ranges: # - 0xA1A1-0xA9EF, various punctuation, symbol, number, separator, and # letter characters. Not all the code points in this range are defined # by GB 2312. # - 0xB0A1-0xF7FE, "ideographic" characters (Hanzi). # # 2. Mac OS Chinese Simplified additions # # a) Two-byte changes and additions # # Mac OS Chinese Simplified shortens the high-byte range so the # first/lead/high bytes of two-byte characters are limited to # the range 0xA1-0xFC. # # The additions use code points that are undefined in EUC-CN, and fall # into two groups. Both are actually standard extensions to GB 2312. # - 0xA6D9-0xA6F5, forms for vertical text # - 0xA8BB-0xA8C0, pinyin extensions for Cantonese, etc. These are # from the GB 6345.1-1986 extension to GB 2312-80. # # a) One-byte additions # # 0x80 LATIN SMALL LETTER U WITH DIAERESIS, alternate # 0x81 height-metric character (see below) # 0x82 width-metric character (see below) # 0xA0 NO-BREAK SPACE # 0xFC COPYRIGHT SIGN # 0xFD TRADE MARK SIGN # 0xFF HORIZONTAL ELLIPSIS # # The two characters at 0x81 and 0x82 are somewhat special. These # are one-byte characters whose glyphs have the same metrics as the # glyphs for the two-byte characters. This way application developers # can use QuickDraw functions such as CharWidth to determine the # metrics of the two-byte character glyphs in a particular font. # 0x81 a character whose glyph has the height of a two-byte # character glyph. # 0x82 a character whose glyph has the advance width of a two- # byte character glyph. Note: For old-style (FBIT/FDEF) # bitmap fonts, the width of this glyph is *half* the width # of the two-byte character glyphs. # # Unicode mapping issues and notes: # --------------------------------- # # 1. Mapping the the Apple additions # # The goals in the mappings provided here are: # - Ensure roundtrip mapping from every character in the Mac OS Chinese # Simplified encoding to Unicode and back # - Use standard Unicode characters as much as possible, to maximize # interchangeability of the resulting Unicode text. Whenever possible, # avoid having content carried by private-use characters. # # Some of the characters in the Mac OS Chinese Simplified Apple additions # do not correspond to distinct, single Unicode characters. To map these # and satisfy both goals above, we employ various strategies. # # a) Map a single Mac OS Chinese Simplified character to a sequence of # standard Unicode characters # # For example, the character 0xA8BF in the Apple additions is a # small n with grave accent. There is currently no single Unicode # character for this. However, it can be mapped to 0x006E+0x0300, # LATIN SMALL LETTER N + COMBINING GRAVE ACCENT. # # b) Use private use characters in combination with standard Unicode # character to mark variants of the standard Unicode character. # # Apple has defined a block of 32 corporate characters as "transcoding # hints." These are used in combination with standard Unicode characters # to force them to be treated in a special way for mapping to other # encodings; they have no other effect. Sixteen of these transcoding # hints are "grouping hints" - they indicate that the next 2-4 Unicode # characters should be treated as a single entity for transcoding. The # other sixteen transcoding hints are "variant tags" - they are like # combining characters, and can follow a standard Unicode (or a sequence # consisting of a base character and other combining characters) to # cause it to be treated in a special way for transcoding. These always # terminate a combining-character sequence. # # The transcoding coding hints used in this mapping table are two # variant tags, 0xF87E and 0xF87F. Since these are combined with # standard Unicode characters, some characters in Mac OS Chinese # Simplified encoding map to a sequence of two Unicodes instead of a # single Unicode character. For example, the Mac OS Chinese Simplified # character at 0xA6D9 is a vertical-text form of the FULLWIDTH COMMA # (the standard mapping is for the horizontal form at 0xA3AC). So 0xA6D9 # is mapped to 0xFF0C (FULLWIDTH COMMA) + 0xF87E (a variant tag). # # c) Use private use characters by themselves to map characters which # have no relationship to any standard Unicode character. # # We define two corporate-zone characters for this purpose: # # 0xF880 height-metric character for double-byte fonts # 0xF881 width-metric character for double-byte fonts # # 2. Mapping the basic EUC-CN characters # # The mappings for GB 2312-1980 are based on the GB 2312 mapping table # provided by the Unicode Consortium (UTC), dated 6 December 1993, # which was created by Glenn Adams and John Jenkins. That table is # Copyright 1991-1993 by Unicode, Inc. # # Some of the non-Hanzi mappings were changed from the UTC mappings. # There were three reasons for this: # - To be more consistent with be the GBK mappings from the China # standards organization (GBK is supposed to include all of GB 2312). # - If the UTC table mapped the GB character to a "fullwidth" version # but there was no mapping to the "basic" version, then the mapping was # changed to the "basic" version. This is more consistent with the ...
wendy6