##Adobe File Version: 1.000 #======================================================================= # FTP file name: FARSI.TXT # # Contents: Map (external version) from Mac OS Farsi # character set to Unicode 2.1 # # Copyright: (c) 1997-1999 by Apple Computer, Inc., all rights # reserved. # # Contact: charsets@apple.com # # Changes: # # b02 1999-Sep-22 Update contact e-mail address. Matches # internal utom<b1>, ufrm<b1>, and Text # Encoding Converter version 1.5. # n04 1998-Feb-05 Show required Unicode character # directionality in a different way. Matches # internal utom<n3>, ufrm<n9>, and Text # Encoding Converter version 1.3. Update # header comments; include information on # loose mapping of digits, and changes to # mapping for the TrueType variant. # n01 1997-Jul-17 First version. Matches internal utom<n1>, # ufrm<n2>. # # Standard header: # ---------------- # # Apple, the Apple logo, and Macintosh are trademarks of Apple # Computer, Inc., registered in the United States and other countries. # Unicode is a trademark of Unicode Inc. For the sake of brevity, # throughout this document, "Macintosh" can be used to refer to # Macintosh computers and "Unicode" can be used to refer to the # Unicode standard. # # Apple makes no warranty or representation, either express or # implied, with respect to these tables, their quality, accuracy, or # fitness for a particular purpose. In no event will Apple be liable # for direct, indirect, special, incidental, or consequential damages # resulting from any defect or inaccuracy in this document or the # accompanying tables. # # These mapping tables and character lists are subject to change. # The latest tables should be available from the following: # # <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> # <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/> # # For general information about Mac OS encodings and these mapping # tables, see the file "README.TXT". # # Format: # ------- # # Three tab-separated columns; # '#' begins a comment which continues to the end of the line. # Column #1 is the Mac OS Farsi code (in hex as 0xNN) # Column #2 is the corresponding Unicode (in hex as 0xNNNN), # possibly preceded by a tag indicating required directionality # (i.e. <LR>+0xNNNN or <RL>+0xNNNN). # Column #3 is a comment containing the Unicode name. # # The entries are in Mac OS Farsi code order. # # Control character mappings are not shown in this table, following # the conventions of the standard UTC mapping tables. However, the # Mac OS Roman character set uses the standard control characters at # 0x00-0x1F and 0x7F. # # Notes on Mac OS Farsi: # ---------------------- # # 1. General # # The Mac OS Farsi character set is used for the Farsi (Persian) # localizations, and for the Persian support in the Arabic Language # Kit. # # The Mac OS Farsi character set is based on the Mac OS Arabic # character set. The main difference is in the right-to-left digits # 0xB0-0xB9: For Mac OS Arabic these correspond to right-left # versions of the Unicode ARABIC-INDIC DIGITs 0660-0669; for # Mac OS Farsi these correspond to right-left versions of the # Unicode EXTENDED ARABIC-INDIC DIGITs 06F0-06F9. The other # difference is in the nature of the font variants. # # For more information, see the comments in the mapping table for # Mac OS Arabic. # # Mac OS Farsi characters 0xEB-0xF2 are non-spacing/combining marks. # # 2. Directional characters and roundtrip fidelity # # The Mac OS Arabic character set (on which Mac OS Farsi is based) # was developed in 1986-1987. At that time the bidirectional line # layout algorithm used in the Mac OS Arabic system was fairly simple; # it used only a few direction classes (instead of the 13 or so now # used in the Unicode bidirectional algorithm). In order to permit # users to handle some tricky layout problems, certain punctuation # and symbol characters have duplicate code points, one with a # left-right direction attribute and the other with a right-left # direction attribute. This is true in Mac OS Farsi too. # # For example, plus sign is encoded at 0x2B with a left-right # attribute, and at 0xAB with a right-left attribute. However, there # is only one PLUS SIGN character in Unicode. This leads to some # interesting problems when mapping between Mac OS Farsi and Unicode; # see below. # # A related problem is that even when a particular character is # encoded only once in Mac OS Farsi, it may have a different # direction attribute than the corresponding Unicode character. # # For example, the Mac OS Farsi character at 0x93 is HORIZONTAL # ELLIPSIS with strong right-left direction. However, the Unicode # character HORIZONTAL ELLIPSIS has direction class neutral. # # 3. Behavior of ASCII-range numbers # # Mac OS Farsi also has two sets of digit codes. # # The digits at 0x30-0x39 may be displayed using either European # digit shapes or Persian digit shapes, depending on context. If there # is a "strong European" character such as a Latin letter on either # side of a sequence consisting of digits 0x30-0x39 and possibly comma # 0x2C or period 0x2E, then the digits will be displayed using # European shapes, the comma will be displayed as Arabic thousands # separator, and the period as Arabic decimal separator. (This will # happen even if there are neutral characters between the digits and # the strong European character). Otherwise, all of these characters # will be displayed using the European shapes. In any case, 0x2C, # 0x2E, and 0x30-0x39 are always left-right. # # The digits at 0xB0-0xB9 are always displayed using Persian digit # shapes, and moreover, these digits always have strong right-left # directionality. These are mainly intended for special layout # purposes such as part numbers, etc. # # 4. Font variants # # The table in this file gives the Unicode mappings for the standard # Mac OS Farsi encoding. This encoding is supported by the Tehran font # (the system font for Farsi), and is the encoding supported by the # text processing utilities. However, the other Farsi fonts actually # implement a somewhat different encoding; this affects nine code # points including 0xAA and 0xC0 (which are also affected by font # variants in Mac OS Arabic). For these nine code points the standard # Mac OS Farsi encoding has the following mappings: # 0x8B -> 0x06BA ARABIC LETTER NOON GHUNNA (Urdu) # 0xA4 -> <RL>+0x0024 DOLLAR SIGN, right-left # 0xAA -> <RL>+0x002A ASTERISK, right-left # 0xC0 -> <RL>+0x274A EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, # right-left # 0xF4 -> 0x0679 ARABIC LETTER TTEH (Urdu) # 0xF7 -> 0x06A4 ARABIC LETTER VEH (for transliteration) # 0xF9 -> 0x0688 ARABIC LETTER DDAL (Urdu) # 0xFA -> 0x0691 ARABIC LETTER RREH (Urdu) # 0xFF -> 0x06D2 ARABIC LETTER YEH BARREE (Urdu) # # The TrueType variant is used for the Farsi TrueType fonts: Ashfahan, # Amir, Kamran, Mashad, NadeemFarsi. It differs from the standard # variant in the following ways: # 0x8B -> 0xF882 Arabic ligature "peace on him" (corporate char.) # 0xA4 -> 0xF86B+0x0631+0x064A+0x0627+0x0644 Arabic ligature rial, # currency sign (uses transcoding hint, see below) # 0xAA -> 0x00D7 MULTIPLICATION SIGN (RL) # 0xC0 -> 0x002A ASTERISK (RL) # 0xF4 -> 0x00B0 DEGREE SIGN (RL) # 0xF7 -> 0xFDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # 0xF9 -> 0x25CF BLACK CIRCLE (RL) # 0xFA -> 0x25A0 BLACK SQUARE (RL) # 0xFF -> 0x25B2 BLACK UP-POINTING TRIANGLE (RL) # # Unicode mapping issues and notes: # --------------------------------- # # 1. Matching the direction of Mac OS Farsi characters # # When Mac OS Farsi encodes a character twice but with different # direction attributes for the two code points - as in the case of # plus sign mentioned above - we need a way to map both Mac OS Farsi # code points to Unicode and back again without loss of information. # With the plus sign, for example, mapping one of the Mac OS Farsi # characters to a code in the Unicode corporate use zone is # undesirable, since both of the plus sign characters are likely to # be used in text that is interchanged. # # The problem is solved with the use of direction override characters # and direction-dependent mappings. When mapping from Mac OS Farsi # to Unicode, we use direction overrides as necessary to force the # direction of the resulting Unicode characters. # # The required direction is indicated by a direction tag in the # mappings. A tag of <LR> means the corresponding Unicode character # must have a strong left-right context, and a tag of <RL> indicates # a right-left context. # # For example, the mapping of 0x2B is given as <LR>+0x002B; the # mapping of 0xAB is given as <RL>+0x002B. If we map an isolated # instance of 0x2B to Unicode, it should be mapped as follows (LRO # indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION # FORMATTING): # # 0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF) # # When mapping sev...
dzidziaz