X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/207f11935755236b21ca4d3fe6b19206e0a9ed33..eed3b46ca184b5bca1dc341e3204f1539b831104:/admin/notes/unicode diff --git a/admin/notes/unicode b/admin/notes/unicode index 13971ef18c..65df2166f2 100644 --- a/admin/notes/unicode +++ b/admin/notes/unicode @@ -1,6 +1,6 @@ -*-mode: text; coding: utf-8;-*- -Copyright (C) 2002-2015 Free Software Foundation, Inc. +Copyright (C) 2002-2016 Free Software Foundation, Inc. See the end of the file for license conditions. Importing a new Unicode Standard version into Emacs @@ -10,8 +10,11 @@ Emacs uses the following files from the Unicode Character Database (a.k.a. "UCD): . UnicodeData.txt + . Blocks.txt . BidiMirroring.txt + . BidiBrackets.txt . IVD_Sequences.txt + . NormalizationTest.txt First, these files need to be copied into admin/unidata/, and then Emacs should be rebuilt for them to take effect. Rebuilding Emacs @@ -27,19 +30,37 @@ so will cause aborts in redisplay. Next, review the changes in UnicodeData.txt vs the previous version used by Emacs. Any changes, be it introduction of new scripts or -addition of codepoints to existing scripts, need corresponding changes -in the data used for filling char-script-table, see characters.el -around line 1300. Other databases and settings in characters.el, such -as the data for char-width-table, might also need changes. +addition of codepoints to existing scripts, might need corresponding +changes in the data used for filling the category-table, case-table, +and char-width-table. The additional scripts should cause automatic +updates in charscript.el, but it is a good idea to look at the results +and see if any changes in admin/unidata/blocks.awk are required. Any new scripts added by UnicodeData.txt will also need updates to -script-representative-chars defined in fontset.el. Other databases in -fontset.el might also need to be updated as needed. +script-representative-chars defined in fontset.el, and also the list +of OTF script tags in otf-script-alist, whose source is on this page: + + https://www.microsoft.com/typography/otspec/scripttags.htm + +Other databases in fontset.el might also need to be updated as needed. + +The function 'ucs-names', defined in lisp/international/mule-cmds.el, +might need to be updated because it knows about used and unused ranges +of Unicode codepoints, which a new release of the Unicode Standard +could change. + +Finally, test normalization functions against NormalizationTests.txt, +in the test/ directory run: + + make lisp/international/ucs-normalize-tests + +See commentary in test/lisp/international/ucs-normalize-tests.el +regarding failing lines. Problems, fixmes and other unicode-related issues ------------------------------------------------------------- -Notes by fx to record various things of variable importance. handa +Notes by fx to record various things of variable importance. Handa needs to check them -- don't take too seriously, especially with regard to completeness. @@ -97,11 +118,11 @@ regard to completeness. * iso-2022 charsets get unified on i/o. - With the change on 2003-01-06, decoding routines put `charset' - property to decoded text, and iso-2022 encoder pay attention + With the change on 2003-01-06, decoding routines put the 'charset' + property onto decoded text, and iso-2022 encoder pay attention to it. Thus, for instance, reading and writing by iso-2022-7bit preserve the original designation sequences. - The property name `preferred-charset' may be better? + The property name 'preferred-charset' may be better? We may have to utilize this property to decide a font. @@ -119,8 +140,6 @@ regard to completeness. * Need multibyte text in menus, e.g. for the above. (Not specific to Unicode -- see Emacs etc/TODO, but now mostly works with gtk.) - * There's currently no support for Unicode normalization. - * Populate char-width-table correctly for Unicode characters and worry about what happens when double-width charsets covering non-CJK characters are unified. @@ -167,8 +186,8 @@ nontrivial changes to the build process. leim/CXTERM-DIC/QJ.tit leim/CXTERM-DIC/SW.tit leim/CXTERM-DIC/TONEPY.tit - leim/MISC-DIC/pinyin.map leim/MISC-DIC/CTLau.html + leim/MISC-DIC/pinyin.map leim/MISC-DIC/ziranma.cin * cp850 @@ -226,7 +245,6 @@ nontrivial changes to the build process. lisp/leim/quail/japanese.el lisp/leim/quail/py-punct.el lisp/leim/quail/pypunct-b5.el - lisp/term/x-win.el This file contains just Chinese characters, and has same problem. Also, it contains characters that cannot be encoded in UTF-8. @@ -237,12 +255,33 @@ nontrivial changes to the build process. These files contain characters that cannot be encoded in UTF-8. - lisp/language/tibetan.el - lisp/language/tibet-util.el + lisp/language/ethio-util.el + lisp/language/ethiopic.el lisp/language/ind-util.el + lisp/language/tibet-util.el + lisp/language/tibetan.el lisp/leim/quail/ethiopic.el lisp/leim/quail/tibetan.el + * binary files + + These files contain binary data, and are not text files. + Some of the entries in this list are patterns, and stand for any + files with the listed extension. + + *.gz + *.icns + *.ico + *.pbm + *.pdf + *.png + *.sig + etc/e/eterm-color + etc/package-keyring.gpg + msdos/emacs.pif + nextstep/GNUstep/Emacs.base/Resources/emacs.tiff + nt/icons/hand.cur + This file is part of GNU Emacs.