X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/aeae5b21b2c7cbf187ee0fef4ddbbe9d5e774b71..c4e8cde8c6cea5ab85abbac10626bd5c1fe5a6af:/README.unicode diff --git a/README.unicode b/README.unicode index 1cef32bddc..79613c9c6e 100644 --- a/README.unicode +++ b/README.unicode @@ -1,4 +1,4 @@ - -*-text-*- + -*-mode: text; coding: latin-1;-*- Problems, fixmes and other issues in the emacs-unicode branch ------------------------------------------------------------- @@ -15,12 +15,17 @@ existing support and the extra stuff at (Editing support is mostly orthogonal to the internal representation.) * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has - undesirable effects. + undesirable effects. E.g.: + (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil + (multibyte-string-p (concat [?£])) => nil + (text-char-description ?£) => "M-#" + + These examples are all fixed by the change of 2002-10-14, but + there still exist questionalble SINGLE_BYTE_CHAR_P in the + code (keymap.c and print.c). * Rationalize character syntax and its relationship to the Unicode - database. Specifically, the latin-N.el files aren't consistent for - common characters (and obviously have redundancies except in - unibyte mode). + database. (Applies mainly to symbol an punctuation syntax.) * Fontset handling and customization needs work. We want to relate fonts to scripts, probably based on the Unicode blocks. The @@ -33,29 +38,27 @@ existing support and the extra stuff at ISO10646 fonts, Emacs checks their repertories to avoid such fonts that don't have a glyph for a specific character. + fx has worked on fontset customization, but was stymied by + basic problems with the way the default face is dealt with + (and something else, I think). This needs revisiting. + * Work is also needed on charset and coding system priorities. * The relevant bits of latin1-disp.el need porting (and probably re-naming/updating). See also cyril-util.el. - * Quail files need more work now the encoding is irrelevant. + * Quail files need more work now the encoding is largely irrelevant. * What to do with the old coding categories stuff? - * Syntax for symbols &c in characters.el needs looking at. - * The preferred-coding-system property of charsets should probably be junked unless it can be made more useful now. - * find-coding-systems-for-charsets needs re-writing or removing. - * find-multibyte-characters needs looking at. - * Implement Korean cp949/UHC and any other important missing + * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing charsets. - * Check up on definitions of tcvn and alternativnj. - * Lazy-load tables for unify-charset somehow? Actually, Emacs clear out all charset maps and unify-map just @@ -65,10 +68,22 @@ existing support and the extra stuff at * Translation tables for {en,de}code currently aren't supported. + This should be fixed by the changes of 2002-10-14. + * Defining CCL coding systems currently doesn't work. + This should be fixed by the changes of 2003-01-30. + * iso-2022 charsets get unified on i/o. + With the change on 2003-01-06, decoding routines put `charset' + property to decoded text, and iso-2022 encoder pay attention + to it. Thus, for instance, reading and writing by + iso-2022-7bit preserve the original designation sequences. + The property name `preferred-charset' may be better? + + We may have to utilize this property to decide a font. + * Revisit locale processing: look at treating the language and charset parts separately. (Language should affect things like speling and calendar, but that's not a Unicode issue.) @@ -79,18 +94,11 @@ existing support and the extra stuff at * Bidi is a separate issue with no support currently. - * DTRT with X keysyms. We should get the right unicode for a given - keysym, not decode raw bytes in some ill-defined coding system. - (fx has some data on keysyms v. unicodes.) - * We need tabular input methods, e.g. for maths symbols. (Not specific to Unicode.) * Need multibyte text in menus, e.g. for the above. (Not specific to - Unicode.) - - * Still can't have case pairs which have different byte lengths -- - can that be fixed for Turkish, at least? + Unicode -- see Emacs etc/TODO, but now mostly works with gtk.) * There's currently no support for Unicode normalization. @@ -107,9 +115,17 @@ existing support and the extra stuff at files generated by 20.2 and the primer are still not loadable. Is it really worth working on it? - * Encoding issues in babyl files/rmail need sorting out. + * Rmail won't work with non-ASCII text. Encoding issues for Babyl + files need sorting out, but rms says Babyl will go before this is + released. * Gnus still needs some attention, and we need to get changes accepted by Gnus maintainers... + * There are type errors lurking, e.g. in + Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them. + * You can grep the code for lots of fixmes. + + * Old auto-save files, and similar files, such as Gnus drafts, + containing non-ASCII characters probably won't be re-read correctly.