1 /* Coding system handler (conversion, detection, and etc).
2 Copyright (C) 1995,97,1998,2002,2003 Electrotechnical Laboratory, JAPAN.
3 Licensed to the Free Software Foundation.
4 Copyright (C) 2001,2002,2003 Free Software Foundation, Inc.
6 This file is part of GNU Emacs.
8 GNU Emacs is free software; you can redistribute it and/or modify
9 it under the terms of the GNU General Public License as published by
10 the Free Software Foundation; either version 2, or (at your option)
13 GNU Emacs is distributed in the hope that it will be useful,
14 but WITHOUT ANY WARRANTY; without even the implied warranty of
15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 GNU General Public License for more details.
18 You should have received a copy of the GNU General Public License
19 along with GNU Emacs; see the file COPYING. If not, write to
20 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
21 Boston, MA 02111-1307, USA. */
23 /*** TABLE OF CONTENTS ***
27 2. Emacs' internal format (emacs-mule) handlers
29 4. Shift-JIS and BIG5 handlers
31 6. End-of-line handlers
32 7. C library functions
33 8. Emacs Lisp library functions
38 /*** 0. General comments ***/
41 /*** GENERAL NOTE on CODING SYSTEMS ***
43 A coding system is an encoding mechanism for one or more character
44 sets. Here's a list of coding systems which Emacs can handle. When
45 we say "decode", it means converting some other coding system to
46 Emacs' internal format (emacs-mule), and when we say "encode",
47 it means converting the coding system emacs-mule to some other
50 0. Emacs' internal format (emacs-mule)
52 Emacs itself holds a multi-lingual character in buffers and strings
53 in a special format. Details are described in section 2.
57 The most famous coding system for multiple character sets. X's
58 Compound Text, various EUCs (Extended Unix Code), and coding
59 systems used in Internet communication such as ISO-2022-JP are
60 all variants of ISO2022. Details are described in section 3.
62 2. SJIS (or Shift-JIS or MS-Kanji-Code)
64 A coding system to encode character sets: ASCII, JISX0201, and
65 JISX0208. Widely used for PC's in Japan. Details are described in
70 A coding system to encode the character sets ASCII and Big5. Widely
71 used for Chinese (mainly in Taiwan and Hong Kong). Details are
72 described in section 4. In this file, when we write "BIG5"
73 (all uppercase), we mean the coding system, and when we write
74 "Big5" (capitalized), we mean the character set.
78 A coding system for text containing random 8-bit code. Emacs does
79 no code conversion on such text except for end-of-line format.
83 If a user wants to read/write text encoded in a coding system not
84 listed above, he can supply a decoder and an encoder for it as CCL
85 (Code Conversion Language) programs. Emacs executes the CCL program
86 while reading/writing.
88 Emacs represents a coding system by a Lisp symbol that has a property
89 `coding-system'. But, before actually using the coding system, the
90 information about it is set in a structure of type `struct
91 coding_system' for rapid processing. See section 6 for more details.
95 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
97 How end-of-line of text is encoded depends on the operating system.
98 For instance, Unix's format is just one byte of `line-feed' code,
99 whereas DOS's format is two-byte sequence of `carriage-return' and
100 `line-feed' codes. MacOS's format is usually one byte of
103 Since text character encoding and end-of-line encoding are
104 independent, any coding system described above can have any
105 end-of-line format. So Emacs has information about end-of-line
106 format in each coding-system. See section 6 for more details.
110 /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
112 These functions check if a text between SRC and SRC_END is encoded
113 in the coding system category XXX. Each returns an integer value in
114 which appropriate flag bits for the category XXX are set. The flag
115 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
116 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
117 of the range 0x80..0x9F are in multibyte form. */
120 detect_coding_emacs_mule (src
, src_end
, multibytep
)
121 unsigned char *src
, *src_end
;
128 /*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
130 These functions decode SRC_BYTES length of unibyte text at SOURCE
131 encoded in CODING to Emacs' internal format. The resulting
132 multibyte text goes to a place pointed to by DESTINATION, the length
133 of which should not exceed DST_BYTES.
135 These functions set the information about original and decoded texts
136 in the members `produced', `produced_char', `consumed', and
137 `consumed_char' of the structure *CODING. They also set the member
138 `result' to one of CODING_FINISH_XXX indicating how the decoding
141 DST_BYTES zero means that the source area and destination area are
142 overlapped, which means that we can produce a decoded text until it
143 reaches the head of the not-yet-decoded source text.
145 Below is a template for these functions. */
148 decode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
149 struct coding_system
*coding
;
150 unsigned char *source
, *destination
;
151 int src_bytes
, dst_bytes
;
157 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
159 These functions encode SRC_BYTES length text at SOURCE from Emacs'
160 internal multibyte format to CODING. The resulting unibyte text
161 goes to a place pointed to by DESTINATION, the length of which
162 should not exceed DST_BYTES.
164 These functions set the information about original and encoded texts
165 in the members `produced', `produced_char', `consumed', and
166 `consumed_char' of the structure *CODING. They also set the member
167 `result' to one of CODING_FINISH_XXX indicating how the encoding
170 DST_BYTES zero means that the source area and destination area are
171 overlapped, which means that we can produce encoded text until it
172 reaches at the head of the not-yet-encoded source text.
174 Below is a template for these functions. */
177 encode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
178 struct coding_system
*coding
;
179 unsigned char *source
, *destination
;
180 int src_bytes
, dst_bytes
;
186 /*** COMMONLY USED MACROS ***/
188 /* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
189 get one, two, and three bytes from the source text respectively.
190 If there are not enough bytes in the source, they jump to
191 `label_end_of_loop'. The caller should set variables `coding',
192 `src' and `src_end' to appropriate pointer in advance. These
193 macros are called from decoding routines `decode_coding_XXX', thus
194 it is assumed that the source text is unibyte. */
196 #define ONE_MORE_BYTE(c1) \
198 if (src >= src_end) \
200 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
201 goto label_end_of_loop; \
206 #define TWO_MORE_BYTES(c1, c2) \
208 if (src + 1 >= src_end) \
210 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
211 goto label_end_of_loop; \
218 /* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
219 form if MULTIBYTEP is nonzero. */
221 #define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep) \
223 if (src >= src_end) \
225 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
226 goto label_end_of_loop; \
229 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
230 c1 = *src++ - 0x20; \
233 /* Set C to the next character at the source text pointed by `src'.
234 If there are not enough characters in the source, jump to
235 `label_end_of_loop'. The caller should set variables `coding'
236 `src', `src_end', and `translation_table' to appropriate pointers
237 in advance. This macro is used in encoding routines
238 `encode_coding_XXX', thus it assumes that the source text is in
239 multibyte form except for 8-bit characters. 8-bit characters are
240 in multibyte form if coding->src_multibyte is nonzero, else they
241 are represented by a single byte. */
243 #define ONE_MORE_CHAR(c) \
245 int len = src_end - src; \
249 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
250 goto label_end_of_loop; \
252 if (coding->src_multibyte \
253 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
254 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
256 c = *src, bytes = 1; \
257 if (!NILP (translation_table)) \
258 c = translate_char (translation_table, c, -1, 0, 0); \
263 /* Produce a multibyte form of character C to `dst'. Jump to
264 `label_end_of_loop' if there's not enough space at `dst'.
266 If we are now in the middle of a composition sequence, the decoded
267 character may be ALTCHAR (for the current composition). In that
268 case, the character goes to coding->cmp_data->data instead of
271 This macro is used in decoding routines. */
273 #define EMIT_CHAR(c) \
275 if (! COMPOSING_P (coding) \
276 || coding->composing == COMPOSITION_RELATIVE \
277 || coding->composing == COMPOSITION_WITH_RULE) \
279 int bytes = CHAR_BYTES (c); \
280 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
282 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
283 goto label_end_of_loop; \
285 dst += CHAR_STRING (c, dst); \
286 coding->produced_char++; \
289 if (COMPOSING_P (coding) \
290 && coding->composing != COMPOSITION_RELATIVE) \
292 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
293 coding->composition_rule_follows \
294 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
299 #define EMIT_ONE_BYTE(c) \
301 if (dst >= (dst_bytes ? dst_end : src)) \
303 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
304 goto label_end_of_loop; \
309 #define EMIT_TWO_BYTES(c1, c2) \
311 if (dst + 2 > (dst_bytes ? dst_end : src)) \
313 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
314 goto label_end_of_loop; \
316 *dst++ = c1, *dst++ = c2; \
319 #define EMIT_BYTES(from, to) \
321 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
323 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
324 goto label_end_of_loop; \
331 /*** 1. Preamble ***/
344 #include "composite.h"
348 #include "intervals.h"
350 #include "termhooks.h"
352 #else /* not emacs */
356 #endif /* not emacs */
358 Lisp_Object Qcoding_system
, Qeol_type
;
359 Lisp_Object Qbuffer_file_coding_system
;
360 Lisp_Object Qpost_read_conversion
, Qpre_write_conversion
;
361 Lisp_Object Qno_conversion
, Qundecided
;
362 Lisp_Object Qcoding_system_history
;
363 Lisp_Object Qsafe_chars
;
364 Lisp_Object Qvalid_codes
;
366 extern Lisp_Object Qinsert_file_contents
, Qwrite_region
;
367 Lisp_Object Qcall_process
, Qcall_process_region
, Qprocess_argument
;
368 Lisp_Object Qstart_process
, Qopen_network_stream
;
369 Lisp_Object Qtarget_idx
;
371 /* If a symbol has this property, evaluate the value to define the
372 symbol as a coding system. */
373 Lisp_Object Qcoding_system_define_form
;
375 Lisp_Object Vselect_safe_coding_system_function
;
377 int coding_system_require_warning
;
379 /* Mnemonic string for each format of end-of-line. */
380 Lisp_Object eol_mnemonic_unix
, eol_mnemonic_dos
, eol_mnemonic_mac
;
381 /* Mnemonic string to indicate format of end-of-line is not yet
383 Lisp_Object eol_mnemonic_undecided
;
385 /* Format of end-of-line decided by system. This is CODING_EOL_LF on
386 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
391 /* Information about which coding system is safe for which chars.
392 The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
394 GENERIC-LIST is a list of generic coding systems which can encode
397 NON-GENERIC-ALIST is an alist of non generic coding systems vs the
398 corresponding char table that contains safe chars. */
399 Lisp_Object Vcoding_system_safe_chars
;
401 Lisp_Object Vcoding_system_list
, Vcoding_system_alist
;
403 Lisp_Object Qcoding_system_p
, Qcoding_system_error
;
405 /* Coding system emacs-mule and raw-text are for converting only
406 end-of-line format. */
407 Lisp_Object Qemacs_mule
, Qraw_text
;
411 /* Coding-systems are handed between Emacs Lisp programs and C internal
412 routines by the following three variables. */
413 /* Coding-system for reading files and receiving data from process. */
414 Lisp_Object Vcoding_system_for_read
;
415 /* Coding-system for writing files and sending data to process. */
416 Lisp_Object Vcoding_system_for_write
;
417 /* Coding-system actually used in the latest I/O. */
418 Lisp_Object Vlast_coding_system_used
;
420 /* A vector of length 256 which contains information about special
421 Latin codes (especially for dealing with Microsoft codes). */
422 Lisp_Object Vlatin_extra_code_table
;
424 /* Flag to inhibit code conversion of end-of-line format. */
425 int inhibit_eol_conversion
;
427 /* Flag to inhibit ISO2022 escape sequence detection. */
428 int inhibit_iso_escape_detection
;
430 /* Flag to make buffer-file-coding-system inherit from process-coding. */
431 int inherit_process_coding_system
;
433 /* Coding system to be used to encode text for terminal display when
434 terminal coding system is nil. */
435 struct coding_system safe_terminal_coding
;
437 /* Default coding system to be used to write a file. */
438 struct coding_system default_buffer_file_coding
;
440 Lisp_Object Vfile_coding_system_alist
;
441 Lisp_Object Vprocess_coding_system_alist
;
442 Lisp_Object Vnetwork_coding_system_alist
;
444 Lisp_Object Vlocale_coding_system
;
448 Lisp_Object Qcoding_category
, Qcoding_category_index
;
450 /* List of symbols `coding-category-xxx' ordered by priority. */
451 Lisp_Object Vcoding_category_list
;
453 /* Table of coding categories (Lisp symbols). */
454 Lisp_Object Vcoding_category_table
;
456 /* Table of names of symbol for each coding-category. */
457 char *coding_category_name
[CODING_CATEGORY_IDX_MAX
] = {
458 "coding-category-emacs-mule",
459 "coding-category-sjis",
460 "coding-category-iso-7",
461 "coding-category-iso-7-tight",
462 "coding-category-iso-8-1",
463 "coding-category-iso-8-2",
464 "coding-category-iso-7-else",
465 "coding-category-iso-8-else",
466 "coding-category-ccl",
467 "coding-category-big5",
468 "coding-category-utf-8",
469 "coding-category-utf-16-be",
470 "coding-category-utf-16-le",
471 "coding-category-raw-text",
472 "coding-category-binary"
475 /* Table of pointers to coding systems corresponding to each coding
477 struct coding_system
*coding_system_table
[CODING_CATEGORY_IDX_MAX
];
479 /* Table of coding category masks. Nth element is a mask for a coding
480 category of which priority is Nth. */
482 int coding_priorities
[CODING_CATEGORY_IDX_MAX
];
484 /* Flag to tell if we look up translation table on character code
486 Lisp_Object Venable_character_translation
;
487 /* Standard translation table to look up on decoding (reading). */
488 Lisp_Object Vstandard_translation_table_for_decode
;
489 /* Standard translation table to look up on encoding (writing). */
490 Lisp_Object Vstandard_translation_table_for_encode
;
492 Lisp_Object Qtranslation_table
;
493 Lisp_Object Qtranslation_table_id
;
494 Lisp_Object Qtranslation_table_for_decode
;
495 Lisp_Object Qtranslation_table_for_encode
;
497 /* Alist of charsets vs revision number. */
498 Lisp_Object Vcharset_revision_alist
;
500 /* Default coding systems used for process I/O. */
501 Lisp_Object Vdefault_process_coding_system
;
503 /* Char table for translating Quail and self-inserting input. */
504 Lisp_Object Vtranslation_table_for_input
;
506 /* Global flag to tell that we can't call post-read-conversion and
507 pre-write-conversion functions. Usually the value is zero, but it
508 is set to 1 temporarily while such functions are running. This is
509 to avoid infinite recursive call. */
510 static int inhibit_pre_post_conversion
;
512 Lisp_Object Qchar_coding_system
;
514 /* Return `safe-chars' property of CODING_SYSTEM (symbol). Don't check
518 coding_safe_chars (coding_system
)
519 Lisp_Object coding_system
;
521 Lisp_Object coding_spec
, plist
, safe_chars
;
523 coding_spec
= Fget (coding_system
, Qcoding_system
);
524 plist
= XVECTOR (coding_spec
)->contents
[3];
525 safe_chars
= Fplist_get (XVECTOR (coding_spec
)->contents
[3], Qsafe_chars
);
526 return (CHAR_TABLE_P (safe_chars
) ? safe_chars
: Qt
);
529 #define CODING_SAFE_CHAR_P(safe_chars, c) \
530 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
533 /*** 2. Emacs internal format (emacs-mule) handlers ***/
535 /* Emacs' internal format for representation of multiple character
536 sets is a kind of multi-byte encoding, i.e. characters are
537 represented by variable-length sequences of one-byte codes.
539 ASCII characters and control characters (e.g. `tab', `newline') are
540 represented by one-byte sequences which are their ASCII codes, in
541 the range 0x00 through 0x7F.
543 8-bit characters of the range 0x80..0x9F are represented by
544 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
547 8-bit characters of the range 0xA0..0xFF are represented by
548 one-byte sequences which are their 8-bit code.
550 The other characters are represented by a sequence of `base
551 leading-code', optional `extended leading-code', and one or two
552 `position-code's. The length of the sequence is determined by the
553 base leading-code. Leading-code takes the range 0x81 through 0x9D,
554 whereas extended leading-code and position-code take the range 0xA0
555 through 0xFF. See `charset.h' for more details about leading-code
558 --- CODE RANGE of Emacs' internal format ---
562 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
563 eight-bit-graphic 0xA0..0xBF
564 ELSE 0x81..0x9D + [0xA0..0xFF]+
565 ---------------------------------------------
567 As this is the internal character representation, the format is
568 usually not used externally (i.e. in a file or in a data sent to a
569 process). But, it is possible to have a text externally in this
570 format (i.e. by encoding by the coding system `emacs-mule').
572 In that case, a sequence of one-byte codes has a slightly different
575 Firstly, all characters in eight-bit-control are represented by
576 one-byte sequences which are their 8-bit code.
578 Next, character composition data are represented by the byte
579 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
581 METHOD is 0xF0 plus one of composition method (enum
584 BYTES is 0xA0 plus the byte length of these composition data,
586 CHARS is 0xA0 plus the number of characters composed by these
589 COMPONENTs are characters of multibyte form or composition
590 rules encoded by two-byte of ASCII codes.
592 In addition, for backward compatibility, the following formats are
593 also recognized as composition data on decoding.
596 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
599 MSEQ is a multibyte form but in these special format:
600 ASCII: 0xA0 ASCII_CODE+0x80,
601 other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
602 RULE is a one byte code of the range 0xA0..0xF0 that
603 represents a composition rule.
606 enum emacs_code_class_type emacs_code_class
[256];
608 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
609 Check if a text is encoded in Emacs' internal format. If it is,
610 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
613 detect_coding_emacs_mule (src
, src_end
, multibytep
)
614 unsigned char *src
, *src_end
;
619 /* Dummy for ONE_MORE_BYTE. */
620 struct coding_system dummy_coding
;
621 struct coding_system
*coding
= &dummy_coding
;
625 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
633 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
642 if (c
== ISO_CODE_ESC
|| c
== ISO_CODE_SI
|| c
== ISO_CODE_SO
)
645 else if (c
>= 0x80 && c
< 0xA0)
648 /* Old leading code for a composite character. */
652 unsigned char *src_base
= src
- 1;
655 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base
, src_end
- src_base
,
658 src
= src_base
+ bytes
;
663 return CODING_CATEGORY_MASK_EMACS_MULE
;
667 /* Record the starting position START and METHOD of one composition. */
669 #define CODING_ADD_COMPOSITION_START(coding, start, method) \
671 struct composition_data *cmp_data = coding->cmp_data; \
672 int *data = cmp_data->data + cmp_data->used; \
673 coding->cmp_data_start = cmp_data->used; \
675 data[1] = cmp_data->char_offset + start; \
676 data[3] = (int) method; \
677 cmp_data->used += 4; \
680 /* Record the ending position END of the current composition. */
682 #define CODING_ADD_COMPOSITION_END(coding, end) \
684 struct composition_data *cmp_data = coding->cmp_data; \
685 int *data = cmp_data->data + coding->cmp_data_start; \
686 data[0] = cmp_data->used - coding->cmp_data_start; \
687 data[2] = cmp_data->char_offset + end; \
690 /* Record one COMPONENT (alternate character or composition rule). */
692 #define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
694 coding->cmp_data->data[coding->cmp_data->used++] = component; \
695 if (coding->cmp_data->used - coding->cmp_data_start \
696 == COMPOSITION_DATA_MAX_BUNCH_LENGTH) \
698 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
699 coding->composing = COMPOSITION_NO; \
704 /* Get one byte from a data pointed by SRC and increment SRC. If SRC
705 is not less than SRC_END, return -1 without incrementing Src. */
707 #define SAFE_ONE_MORE_BYTE() (src >= src_end ? -1 : *src++)
710 /* Decode a character represented as a component of composition
711 sequence of Emacs 20 style at SRC. Set C to that character, store
712 its multibyte form sequence at P, and set P to the end of that
713 sequence. If no valid character is found, set C to -1. */
715 #define DECODE_EMACS_MULE_COMPOSITION_CHAR(c, p) \
719 c = SAFE_ONE_MORE_BYTE (); \
722 if (CHAR_HEAD_P (c)) \
724 else if (c == 0xA0) \
726 c = SAFE_ONE_MORE_BYTE (); \
735 else if (BASE_LEADING_CODE_P (c - 0x20)) \
737 unsigned char *p0 = p; \
741 bytes = BYTES_BY_CHAR_HEAD (c); \
744 c = SAFE_ONE_MORE_BYTE (); \
749 if (UNIBYTE_STR_AS_MULTIBYTE_P (p0, p - p0, bytes) \
750 || (coding->flags /* We are recovering a file. */ \
751 && p0[0] == LEADING_CODE_8_BIT_CONTROL \
752 && ! CHAR_HEAD_P (p0[1]))) \
753 c = STRING_CHAR (p0, bytes); \
762 /* Decode a composition rule represented as a component of composition
763 sequence of Emacs 20 style at SRC. Set C to the rule. If not
764 valid rule is found, set C to -1. */
766 #define DECODE_EMACS_MULE_COMPOSITION_RULE(c) \
768 c = SAFE_ONE_MORE_BYTE (); \
770 if (c < 0 || c >= 81) \
774 gref = c / 9, nref = c % 9; \
775 c = COMPOSITION_ENCODE_RULE (gref, nref); \
780 /* Decode composition sequence encoded by `emacs-mule' at the source
781 pointed by SRC. SRC_END is the end of source. Store information
782 of the composition in CODING->cmp_data.
784 For backward compatibility, decode also a composition sequence of
785 Emacs 20 style. In that case, the composition sequence contains
786 characters that should be extracted into a buffer or string. Store
787 those characters at *DESTINATION in multibyte form.
789 If we encounter an invalid byte sequence, return 0.
790 If we encounter an insufficient source or destination, or
791 insufficient space in CODING->cmp_data, return 1.
792 Otherwise, return consumed bytes in the source.
796 decode_composition_emacs_mule (coding
, src
, src_end
,
797 destination
, dst_end
, dst_bytes
)
798 struct coding_system
*coding
;
799 unsigned char *src
, *src_end
, **destination
, *dst_end
;
802 unsigned char *dst
= *destination
;
803 int method
, data_len
, nchars
;
804 unsigned char *src_base
= src
++;
805 /* Store components of composition. */
806 int component
[COMPOSITION_DATA_MAX_BUNCH_LENGTH
];
808 /* Store multibyte form of characters to be composed. This is for
809 Emacs 20 style composition sequence. */
810 unsigned char buf
[MAX_COMPOSITION_COMPONENTS
* MAX_MULTIBYTE_LENGTH
];
811 unsigned char *bufp
= buf
;
812 int c
, i
, gref
, nref
;
814 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
815 >= COMPOSITION_DATA_SIZE
)
817 coding
->result
= CODING_FINISH_INSUFFICIENT_CMP
;
822 if (c
- 0xF0 >= COMPOSITION_RELATIVE
823 && c
- 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS
)
828 with_rule
= (method
== COMPOSITION_WITH_RULE
829 || method
== COMPOSITION_WITH_RULE_ALTCHARS
);
833 || src_base
+ data_len
> src_end
)
839 for (ncomponent
= 0; src
< src_base
+ data_len
; ncomponent
++)
841 /* If it is longer than this, it can't be valid. */
842 if (ncomponent
>= COMPOSITION_DATA_MAX_BUNCH_LENGTH
)
845 if (ncomponent
% 2 && with_rule
)
847 ONE_MORE_BYTE (gref
);
849 ONE_MORE_BYTE (nref
);
851 c
= COMPOSITION_ENCODE_RULE (gref
, nref
);
856 if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
)
857 || (coding
->flags
/* We are recovering a file. */
858 && src
[0] == LEADING_CODE_8_BIT_CONTROL
859 && ! CHAR_HEAD_P (src
[1])))
860 c
= STRING_CHAR (src
, bytes
);
865 component
[ncomponent
] = c
;
870 /* This may be an old Emacs 20 style format. See the comment at
871 the section 2 of this file. */
872 while (src
< src_end
&& !CHAR_HEAD_P (*src
)) src
++;
874 && !(coding
->mode
& CODING_MODE_LAST_BLOCK
))
875 goto label_end_of_loop
;
881 method
= COMPOSITION_RELATIVE
;
882 for (ncomponent
= 0; ncomponent
< MAX_COMPOSITION_COMPONENTS
;)
884 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
887 component
[ncomponent
++] = c
;
895 method
= COMPOSITION_WITH_RULE
;
897 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
902 ncomponent
< MAX_COMPOSITION_COMPONENTS
* 2 - 1;)
904 DECODE_EMACS_MULE_COMPOSITION_RULE (c
);
907 component
[ncomponent
++] = c
;
908 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
911 component
[ncomponent
++] = c
;
915 nchars
= (ncomponent
+ 1) / 2;
921 if (buf
== bufp
|| dst
+ (bufp
- buf
) <= (dst_bytes
? dst_end
: src
))
923 CODING_ADD_COMPOSITION_START (coding
, coding
->produced_char
, method
);
924 for (i
= 0; i
< ncomponent
; i
++)
925 CODING_ADD_COMPOSITION_COMPONENT (coding
, component
[i
]);
926 CODING_ADD_COMPOSITION_END (coding
, coding
->produced_char
+ nchars
);
929 unsigned char *p
= buf
;
930 EMIT_BYTES (p
, bufp
);
931 *destination
+= bufp
- buf
;
932 coding
->produced_char
+= nchars
;
934 return (src
- src_base
);
940 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
943 decode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
944 struct coding_system
*coding
;
945 unsigned char *source
, *destination
;
946 int src_bytes
, dst_bytes
;
948 unsigned char *src
= source
;
949 unsigned char *src_end
= source
+ src_bytes
;
950 unsigned char *dst
= destination
;
951 unsigned char *dst_end
= destination
+ dst_bytes
;
952 /* SRC_BASE remembers the start position in source in each loop.
953 The loop will be exited when there's not enough source code, or
954 when there's not enough destination area to produce a
956 unsigned char *src_base
;
958 coding
->produced_char
= 0;
959 while ((src_base
= src
) < src_end
)
961 unsigned char tmp
[MAX_MULTIBYTE_LENGTH
], *p
;
968 if (coding
->eol_type
== CODING_EOL_CR
)
970 else if (coding
->eol_type
== CODING_EOL_CRLF
)
980 coding
->produced_char
++;
983 else if (*src
== '\n')
985 if ((coding
->eol_type
== CODING_EOL_CR
986 || coding
->eol_type
== CODING_EOL_CRLF
)
987 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
989 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
990 goto label_end_of_loop
;
993 coding
->produced_char
++;
996 else if (*src
== 0x80 && coding
->cmp_data
)
998 /* Start of composition data. */
999 int consumed
= decode_composition_emacs_mule (coding
, src
, src_end
,
1003 goto label_end_of_loop
;
1004 else if (consumed
> 0)
1009 bytes
= CHAR_STRING (*src
, tmp
);
1013 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
)
1014 || (coding
->flags
/* We are recovering a file. */
1015 && src
[0] == LEADING_CODE_8_BIT_CONTROL
1016 && ! CHAR_HEAD_P (src
[1])))
1025 bytes
= BYTES_BY_CHAR_HEAD (*src
);
1027 for (i
= 1; i
< bytes
; i
++)
1030 if (CHAR_HEAD_P (c
))
1035 bytes
= CHAR_STRING (*src_base
, tmp
);
1044 if (dst
+ bytes
>= (dst_bytes
? dst_end
: src
))
1046 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
1049 while (bytes
--) *dst
++ = *p
++;
1050 coding
->produced_char
++;
1053 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
1054 coding
->produced
= dst
- destination
;
1058 /* Encode composition data stored at DATA into a special byte sequence
1059 starting by 0x80. Update CODING->cmp_data_start and maybe
1060 CODING->cmp_data for the next call. */
1062 #define ENCODE_COMPOSITION_EMACS_MULE(coding, data) \
1064 unsigned char buf[1024], *p0 = buf, *p; \
1065 int len = data[0]; \
1069 buf[1] = 0xF0 + data[3]; /* METHOD */ \
1070 buf[3] = 0xA0 + (data[2] - data[1]); /* COMPOSED-CHARS */ \
1072 if (data[3] == COMPOSITION_WITH_RULE \
1073 || data[3] == COMPOSITION_WITH_RULE_ALTCHARS) \
1075 p += CHAR_STRING (data[4], p); \
1076 for (i = 5; i < len; i += 2) \
1079 COMPOSITION_DECODE_RULE (data[i], gref, nref); \
1080 *p++ = 0x20 + gref; \
1081 *p++ = 0x20 + nref; \
1082 p += CHAR_STRING (data[i + 1], p); \
1087 for (i = 4; i < len; i++) \
1088 p += CHAR_STRING (data[i], p); \
1090 buf[2] = 0xA0 + (p - buf); /* COMPONENTS-BYTES */ \
1092 if (dst + (p - buf) + 4 > (dst_bytes ? dst_end : src)) \
1094 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
1095 goto label_end_of_loop; \
1099 coding->cmp_data_start += data[0]; \
1100 if (coding->cmp_data_start == coding->cmp_data->used \
1101 && coding->cmp_data->next) \
1103 coding->cmp_data = coding->cmp_data->next; \
1104 coding->cmp_data_start = 0; \
1109 static void encode_eol
P_ ((struct coding_system
*, const unsigned char *,
1110 unsigned char *, int, int));
1113 encode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
1114 struct coding_system
*coding
;
1115 unsigned char *source
, *destination
;
1116 int src_bytes
, dst_bytes
;
1118 unsigned char *src
= source
;
1119 unsigned char *src_end
= source
+ src_bytes
;
1120 unsigned char *dst
= destination
;
1121 unsigned char *dst_end
= destination
+ dst_bytes
;
1122 unsigned char *src_base
;
1127 Lisp_Object translation_table
;
1129 translation_table
= Qnil
;
1131 /* Optimization for the case that there's no composition. */
1132 if (!coding
->cmp_data
|| coding
->cmp_data
->used
== 0)
1134 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
1138 char_offset
= coding
->cmp_data
->char_offset
;
1139 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1144 /* If SRC starts a composition, encode the information about the
1145 composition in advance. */
1146 if (coding
->cmp_data_start
< coding
->cmp_data
->used
1147 && char_offset
+ coding
->consumed_char
== data
[1])
1149 ENCODE_COMPOSITION_EMACS_MULE (coding
, data
);
1150 char_offset
= coding
->cmp_data
->char_offset
;
1151 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1155 if (c
== '\n' && (coding
->eol_type
== CODING_EOL_CRLF
1156 || coding
->eol_type
== CODING_EOL_CR
))
1158 if (coding
->eol_type
== CODING_EOL_CRLF
)
1159 EMIT_TWO_BYTES ('\r', c
);
1161 EMIT_ONE_BYTE ('\r');
1163 else if (SINGLE_BYTE_CHAR_P (c
))
1165 if (coding
->flags
&& ! ASCII_BYTE_P (c
))
1167 /* As we are auto saving, retain the multibyte form for
1169 unsigned char buf
[MAX_MULTIBYTE_LENGTH
];
1170 int bytes
= CHAR_STRING (c
, buf
);
1173 EMIT_ONE_BYTE (buf
[0]);
1175 EMIT_TWO_BYTES (buf
[0], buf
[1]);
1181 EMIT_BYTES (src_base
, src
);
1182 coding
->consumed_char
++;
1185 coding
->consumed
= src_base
- source
;
1186 coding
->produced
= coding
->produced_char
= dst
- destination
;
1191 /*** 3. ISO2022 handlers ***/
1193 /* The following note describes the coding system ISO2022 briefly.
1194 Since the intention of this note is to help understand the
1195 functions in this file, some parts are NOT ACCURATE or are OVERLY
1196 SIMPLIFIED. For thorough understanding, please refer to the
1197 original document of ISO2022. This is equivalent to the standard
1198 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
1200 ISO2022 provides many mechanisms to encode several character sets
1201 in 7-bit and 8-bit environments. For 7-bit environments, all text
1202 is encoded using bytes less than 128. This may make the encoded
1203 text a little bit longer, but the text passes more easily through
1204 several types of gateway, some of which strip off the MSB (Most
1207 There are two kinds of character sets: control character sets and
1208 graphic character sets. The former contain control characters such
1209 as `newline' and `escape' to provide control functions (control
1210 functions are also provided by escape sequences). The latter
1211 contain graphic characters such as 'A' and '-'. Emacs recognizes
1212 two control character sets and many graphic character sets.
1214 Graphic character sets are classified into one of the following
1215 four classes, according to the number of bytes (DIMENSION) and
1216 number of characters in one dimension (CHARS) of the set:
1217 - DIMENSION1_CHARS94
1218 - DIMENSION1_CHARS96
1219 - DIMENSION2_CHARS94
1220 - DIMENSION2_CHARS96
1222 In addition, each character set is assigned an identification tag,
1223 unique for each set, called the "final character" (denoted as <F>
1224 hereafter). The <F> of each character set is decided by ECMA(*)
1225 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1226 (0x30..0x3F are for private use only).
1228 Note (*): ECMA = European Computer Manufacturers Association
1230 Here are examples of graphic character sets [NAME(<F>)]:
1231 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1232 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1233 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1234 o DIMENSION2_CHARS96 -- none for the moment
1236 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
1237 C0 [0x00..0x1F] -- control character plane 0
1238 GL [0x20..0x7F] -- graphic character plane 0
1239 C1 [0x80..0x9F] -- control character plane 1
1240 GR [0xA0..0xFF] -- graphic character plane 1
1242 A control character set is directly designated and invoked to C0 or
1243 C1 by an escape sequence. The most common case is that:
1244 - ISO646's control character set is designated/invoked to C0, and
1245 - ISO6429's control character set is designated/invoked to C1,
1246 and usually these designations/invocations are omitted in encoded
1247 text. In a 7-bit environment, only C0 can be used, and a control
1248 character for C1 is encoded by an appropriate escape sequence to
1249 fit into the environment. All control characters for C1 are
1250 defined to have corresponding escape sequences.
1252 A graphic character set is at first designated to one of four
1253 graphic registers (G0 through G3), then these graphic registers are
1254 invoked to GL or GR. These designations and invocations can be
1255 done independently. The most common case is that G0 is invoked to
1256 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
1257 these invocations and designations are omitted in encoded text.
1258 In a 7-bit environment, only GL can be used.
1260 When a graphic character set of CHARS94 is invoked to GL, codes
1261 0x20 and 0x7F of the GL area work as control characters SPACE and
1262 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
1265 There are two ways of invocation: locking-shift and single-shift.
1266 With locking-shift, the invocation lasts until the next different
1267 invocation, whereas with single-shift, the invocation affects the
1268 following character only and doesn't affect the locking-shift
1269 state. Invocations are done by the following control characters or
1272 ----------------------------------------------------------------------
1273 abbrev function cntrl escape seq description
1274 ----------------------------------------------------------------------
1275 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
1276 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
1277 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
1278 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
1279 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
1280 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
1281 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
1282 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
1283 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
1284 ----------------------------------------------------------------------
1285 (*) These are not used by any known coding system.
1287 Control characters for these functions are defined by macros
1288 ISO_CODE_XXX in `coding.h'.
1290 Designations are done by the following escape sequences:
1291 ----------------------------------------------------------------------
1292 escape sequence description
1293 ----------------------------------------------------------------------
1294 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
1295 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
1296 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
1297 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
1298 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
1299 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
1300 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
1301 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
1302 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
1303 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
1304 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
1305 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
1306 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
1307 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
1308 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
1309 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
1310 ----------------------------------------------------------------------
1312 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
1313 of dimension 1, chars 94, and final character <F>, etc...
1315 Note (*): Although these designations are not allowed in ISO2022,
1316 Emacs accepts them on decoding, and produces them on encoding
1317 CHARS96 character sets in a coding system which is characterized as
1318 7-bit environment, non-locking-shift, and non-single-shift.
1320 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
1321 '(' can be omitted. We refer to this as "short-form" hereafter.
1323 Now you may notice that there are a lot of ways of encoding the
1324 same multilingual text in ISO2022. Actually, there exist many
1325 coding systems such as Compound Text (used in X11's inter client
1326 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
1327 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
1328 localized platforms), and all of these are variants of ISO2022.
1330 In addition to the above, Emacs handles two more kinds of escape
1331 sequences: ISO6429's direction specification and Emacs' private
1332 sequence for specifying character composition.
1334 ISO6429's direction specification takes the following form:
1335 o CSI ']' -- end of the current direction
1336 o CSI '0' ']' -- end of the current direction
1337 o CSI '1' ']' -- start of left-to-right text
1338 o CSI '2' ']' -- start of right-to-left text
1339 The control character CSI (0x9B: control sequence introducer) is
1340 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
1342 Character composition specification takes the following form:
1343 o ESC '0' -- start relative composition
1344 o ESC '1' -- end composition
1345 o ESC '2' -- start rule-base composition (*)
1346 o ESC '3' -- start relative composition with alternate chars (**)
1347 o ESC '4' -- start rule-base composition with alternate chars (**)
1348 Since these are not standard escape sequences of any ISO standard,
1349 the use of them with these meanings is restricted to Emacs only.
1351 (*) This form is used only in Emacs 20.5 and older versions,
1352 but the newer versions can safely decode it.
1353 (**) This form is used only in Emacs 21.1 and newer versions,
1354 and the older versions can't decode it.
1356 Here's a list of example usages of these composition escape
1357 sequences (categorized by `enum composition_method').
1359 COMPOSITION_RELATIVE:
1360 ESC 0 CHAR [ CHAR ] ESC 1
1361 COMPOSITION_WITH_RULE:
1362 ESC 2 CHAR [ RULE CHAR ] ESC 1
1363 COMPOSITION_WITH_ALTCHARS:
1364 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
1365 COMPOSITION_WITH_RULE_ALTCHARS:
1366 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
1368 enum iso_code_class_type iso_code_class
[256];
1370 #define CHARSET_OK(idx, charset, c) \
1371 (coding_system_table[idx] \
1372 && (charset == CHARSET_ASCII \
1373 || (safe_chars = coding_safe_chars (coding_system_table[idx]->symbol), \
1374 CODING_SAFE_CHAR_P (safe_chars, c))) \
1375 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
1377 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
1379 #define SHIFT_OUT_OK(idx) \
1380 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1382 #define COMPOSITION_OK(idx) \
1383 (coding_system_table[idx]->composing != COMPOSITION_DISABLED)
1385 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
1386 Check if a text is encoded in ISO2022. If it is, return an
1387 integer in which appropriate flag bits any of:
1388 CODING_CATEGORY_MASK_ISO_7
1389 CODING_CATEGORY_MASK_ISO_7_TIGHT
1390 CODING_CATEGORY_MASK_ISO_8_1
1391 CODING_CATEGORY_MASK_ISO_8_2
1392 CODING_CATEGORY_MASK_ISO_7_ELSE
1393 CODING_CATEGORY_MASK_ISO_8_ELSE
1394 are set. If a code which should never appear in ISO2022 is found,
1398 detect_coding_iso2022 (src
, src_end
, multibytep
)
1399 unsigned char *src
, *src_end
;
1402 int mask
= CODING_CATEGORY_MASK_ISO
;
1404 int reg
[4], shift_out
= 0, single_shifting
= 0;
1406 /* Dummy for ONE_MORE_BYTE. */
1407 struct coding_system dummy_coding
;
1408 struct coding_system
*coding
= &dummy_coding
;
1409 Lisp_Object safe_chars
;
1411 reg
[0] = CHARSET_ASCII
, reg
[1] = reg
[2] = reg
[3] = -1;
1412 while (mask
&& src
< src_end
)
1414 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1419 if (inhibit_iso_escape_detection
)
1421 single_shifting
= 0;
1422 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1423 if (c
>= '(' && c
<= '/')
1425 /* Designation sequence for a charset of dimension 1. */
1426 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
1427 if (c1
< ' ' || c1
>= 0x80
1428 || (charset
= iso_charset_table
[0][c
>= ','][c1
]) < 0)
1429 /* Invalid designation sequence. Just ignore. */
1431 reg
[(c
- '(') % 4] = charset
;
1435 /* Designation sequence for a charset of dimension 2. */
1436 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1437 if (c
>= '@' && c
<= 'B')
1438 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
1439 reg
[0] = charset
= iso_charset_table
[1][0][c
];
1440 else if (c
>= '(' && c
<= '/')
1442 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
1443 if (c1
< ' ' || c1
>= 0x80
1444 || (charset
= iso_charset_table
[1][c
>= ','][c1
]) < 0)
1445 /* Invalid designation sequence. Just ignore. */
1447 reg
[(c
- '(') % 4] = charset
;
1450 /* Invalid designation sequence. Just ignore. */
1453 else if (c
== 'N' || c
== 'O')
1455 /* ESC <Fe> for SS2 or SS3. */
1456 mask
&= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1459 else if (c
>= '0' && c
<= '4')
1461 /* ESC <Fp> for start/end composition. */
1462 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7
))
1463 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1465 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1466 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
))
1467 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1469 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1470 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_1
))
1471 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1473 mask
&= ~CODING_CATEGORY_MASK_ISO_8_1
;
1474 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_2
))
1475 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1477 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1478 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
))
1479 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1481 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1482 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
))
1483 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1485 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1489 /* Invalid escape sequence. Just ignore. */
1492 /* We found a valid designation sequence for CHARSET. */
1493 mask
&= ~CODING_CATEGORY_MASK_ISO_8BIT
;
1494 c
= MAKE_CHAR (charset
, 0, 0);
1495 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7
, charset
, c
))
1496 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1498 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1499 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
, charset
, c
))
1500 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1502 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1503 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
, charset
, c
))
1504 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1506 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1507 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
, charset
, c
))
1508 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1510 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1514 if (inhibit_iso_escape_detection
)
1516 single_shifting
= 0;
1519 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
)
1520 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
)))
1522 /* Locking shift out. */
1523 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1524 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1529 if (inhibit_iso_escape_detection
)
1531 single_shifting
= 0;
1534 /* Locking shift in. */
1535 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1536 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1541 single_shifting
= 0;
1545 int newmask
= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1547 if (inhibit_iso_escape_detection
)
1549 if (c
!= ISO_CODE_CSI
)
1551 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1552 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1553 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1554 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1555 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1556 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1557 single_shifting
= 1;
1559 if (VECTORP (Vlatin_extra_code_table
)
1560 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1562 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1563 & CODING_FLAG_ISO_LATIN_EXTRA
)
1564 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1565 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1566 & CODING_FLAG_ISO_LATIN_EXTRA
)
1567 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1570 mask_found
|= newmask
;
1577 single_shifting
= 0;
1582 single_shifting
= 0;
1583 if (VECTORP (Vlatin_extra_code_table
)
1584 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1588 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1589 & CODING_FLAG_ISO_LATIN_EXTRA
)
1590 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1591 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1592 & CODING_FLAG_ISO_LATIN_EXTRA
)
1593 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1595 mask_found
|= newmask
;
1602 mask
&= ~(CODING_CATEGORY_MASK_ISO_7BIT
1603 | CODING_CATEGORY_MASK_ISO_7_ELSE
);
1604 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1605 /* Check the length of succeeding codes of the range
1606 0xA0..0FF. If the byte length is odd, we exclude
1607 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1608 when we are not single shifting. */
1609 if (!single_shifting
1610 && mask
& CODING_CATEGORY_MASK_ISO_8_2
)
1615 while (src
< src_end
)
1617 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1623 if (i
& 1 && src
< src_end
)
1624 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1626 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1628 /* This means that we have read one extra byte. */
1636 return (mask
& mask_found
);
1639 /* Decode a character of which charset is CHARSET, the 1st position
1640 code is C1, the 2nd position code is C2, and return the decoded
1641 character code. If the variable `translation_table' is non-nil,
1642 returned the translated code. */
1644 #define DECODE_ISO_CHARACTER(charset, c1, c2) \
1645 (NILP (translation_table) \
1646 ? MAKE_CHAR (charset, c1, c2) \
1647 : translate_char (translation_table, -1, charset, c1, c2))
1649 /* Set designation state into CODING. */
1650 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1654 if (final_char < '0' || final_char >= 128) \
1655 goto label_invalid_code; \
1656 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1657 make_number (chars), \
1658 make_number (final_char)); \
1659 c = MAKE_CHAR (charset, 0, 0); \
1661 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
1662 || CODING_SAFE_CHAR_P (safe_chars, c))) \
1664 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1666 && charset == CHARSET_ASCII) \
1668 /* We should insert this designation sequence as is so \
1669 that it is surely written back to a file. */ \
1670 coding->spec.iso2022.last_invalid_designation_register = -1; \
1671 goto label_invalid_code; \
1673 coding->spec.iso2022.last_invalid_designation_register = -1; \
1674 if ((coding->mode & CODING_MODE_DIRECTION) \
1675 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1676 charset = CHARSET_REVERSE_CHARSET (charset); \
1677 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1681 coding->spec.iso2022.last_invalid_designation_register = reg; \
1682 goto label_invalid_code; \
1686 /* Allocate a memory block for storing information about compositions.
1687 The block is chained to the already allocated blocks. */
1690 coding_allocate_composition_data (coding
, char_offset
)
1691 struct coding_system
*coding
;
1694 struct composition_data
*cmp_data
1695 = (struct composition_data
*) xmalloc (sizeof *cmp_data
);
1697 cmp_data
->char_offset
= char_offset
;
1699 cmp_data
->prev
= coding
->cmp_data
;
1700 cmp_data
->next
= NULL
;
1701 if (coding
->cmp_data
)
1702 coding
->cmp_data
->next
= cmp_data
;
1703 coding
->cmp_data
= cmp_data
;
1704 coding
->cmp_data_start
= 0;
1705 coding
->composing
= COMPOSITION_NO
;
1708 /* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4.
1709 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1
1710 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1
1711 ESC 3 : altchar composition : ESC 3 ALT ... ESC 0 CHAR ... ESC 1
1712 ESC 4 : alt&rule composition : ESC 4 ALT RULE .. ALT ESC 0 CHAR ... ESC 1
1715 #define DECODE_COMPOSITION_START(c1) \
1717 if (coding->composing == COMPOSITION_DISABLED) \
1719 *dst++ = ISO_CODE_ESC; \
1720 *dst++ = c1 & 0x7f; \
1721 coding->produced_char += 2; \
1723 else if (!COMPOSING_P (coding)) \
1725 /* This is surely the start of a composition. We must be sure \
1726 that coding->cmp_data has enough space to store the \
1727 information about the composition. If not, terminate the \
1728 current decoding loop, allocate one more memory block for \
1729 coding->cmp_data in the caller, then start the decoding \
1730 loop again. We can't allocate memory here directly because \
1731 it may cause buffer/string relocation. */ \
1732 if (!coding->cmp_data \
1733 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1734 >= COMPOSITION_DATA_SIZE)) \
1736 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1737 goto label_end_of_loop; \
1739 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1740 : c1 == '2' ? COMPOSITION_WITH_RULE \
1741 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1742 : COMPOSITION_WITH_RULE_ALTCHARS); \
1743 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1744 coding->composing); \
1745 coding->composition_rule_follows = 0; \
1749 /* We are already handling a composition. If the method is \
1750 the following two, the codes following the current escape \
1751 sequence are actual characters stored in a buffer. */ \
1752 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1753 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1755 coding->composing = COMPOSITION_RELATIVE; \
1756 coding->composition_rule_follows = 0; \
1761 /* Handle composition end sequence ESC 1. */
1763 #define DECODE_COMPOSITION_END(c1) \
1765 if (! COMPOSING_P (coding)) \
1767 *dst++ = ISO_CODE_ESC; \
1769 coding->produced_char += 2; \
1773 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1774 coding->composing = COMPOSITION_NO; \
1778 /* Decode a composition rule from the byte C1 (and maybe one more byte
1779 from SRC) and store one encoded composition rule in
1780 coding->cmp_data. */
1782 #define DECODE_COMPOSITION_RULE(c1) \
1786 if (c1 < 81) /* old format (before ver.21) */ \
1788 int gref = (c1) / 9; \
1789 int nref = (c1) % 9; \
1790 if (gref == 4) gref = 10; \
1791 if (nref == 4) nref = 10; \
1792 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1794 else if (c1 < 93) /* new format (after ver.21) */ \
1796 ONE_MORE_BYTE (c2); \
1797 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1799 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1800 coding->composition_rule_follows = 0; \
1804 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1807 decode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
1808 struct coding_system
*coding
;
1809 unsigned char *source
, *destination
;
1810 int src_bytes
, dst_bytes
;
1812 unsigned char *src
= source
;
1813 unsigned char *src_end
= source
+ src_bytes
;
1814 unsigned char *dst
= destination
;
1815 unsigned char *dst_end
= destination
+ dst_bytes
;
1816 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1817 int charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1818 int charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
1819 /* SRC_BASE remembers the start position in source in each loop.
1820 The loop will be exited when there's not enough source code
1821 (within macro ONE_MORE_BYTE), or when there's not enough
1822 destination area to produce a character (within macro
1824 unsigned char *src_base
;
1826 Lisp_Object translation_table
;
1827 Lisp_Object safe_chars
;
1829 safe_chars
= coding_safe_chars (coding
->symbol
);
1831 if (NILP (Venable_character_translation
))
1832 translation_table
= Qnil
;
1835 translation_table
= coding
->translation_table_for_decode
;
1836 if (NILP (translation_table
))
1837 translation_table
= Vstandard_translation_table_for_decode
;
1840 coding
->result
= CODING_FINISH_NORMAL
;
1849 /* We produce no character or one character. */
1850 switch (iso_code_class
[c1
])
1852 case ISO_0x20_or_0x7F
:
1853 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1855 DECODE_COMPOSITION_RULE (c1
);
1858 if (charset0
< 0 || CHARSET_CHARS (charset0
) == 94)
1860 /* This is SPACE or DEL. */
1861 charset
= CHARSET_ASCII
;
1864 /* This is a graphic character, we fall down ... */
1866 case ISO_graphic_plane_0
:
1867 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1869 DECODE_COMPOSITION_RULE (c1
);
1875 case ISO_0xA0_or_0xFF
:
1876 if (charset1
< 0 || CHARSET_CHARS (charset1
) == 94
1877 || coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
1878 goto label_invalid_code
;
1879 /* This is a graphic character, we fall down ... */
1881 case ISO_graphic_plane_1
:
1883 goto label_invalid_code
;
1888 if (COMPOSING_P (coding
))
1889 DECODE_COMPOSITION_END ('1');
1891 /* All ISO2022 control characters in this class have the
1892 same representation in Emacs internal format. */
1894 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
1895 && (coding
->eol_type
== CODING_EOL_CR
1896 || coding
->eol_type
== CODING_EOL_CRLF
))
1898 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
1899 goto label_end_of_loop
;
1901 charset
= CHARSET_ASCII
;
1905 if (COMPOSING_P (coding
))
1906 DECODE_COMPOSITION_END ('1');
1907 goto label_invalid_code
;
1909 case ISO_carriage_return
:
1910 if (COMPOSING_P (coding
))
1911 DECODE_COMPOSITION_END ('1');
1913 if (coding
->eol_type
== CODING_EOL_CR
)
1915 else if (coding
->eol_type
== CODING_EOL_CRLF
)
1918 if (c1
!= ISO_CODE_LF
)
1924 charset
= CHARSET_ASCII
;
1928 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1929 || CODING_SPEC_ISO_DESIGNATION (coding
, 1) < 0)
1930 goto label_invalid_code
;
1931 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 1;
1932 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1936 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
1937 goto label_invalid_code
;
1938 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
1939 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1942 case ISO_single_shift_2_7
:
1943 case ISO_single_shift_2
:
1944 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1945 goto label_invalid_code
;
1946 /* SS2 is handled as an escape sequence of ESC 'N' */
1948 goto label_escape_sequence
;
1950 case ISO_single_shift_3
:
1951 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1952 goto label_invalid_code
;
1953 /* SS2 is handled as an escape sequence of ESC 'O' */
1955 goto label_escape_sequence
;
1957 case ISO_control_sequence_introducer
:
1958 /* CSI is handled as an escape sequence of ESC '[' ... */
1960 goto label_escape_sequence
;
1964 label_escape_sequence
:
1965 /* Escape sequences handled by Emacs are invocation,
1966 designation, direction specification, and character
1967 composition specification. */
1970 case '&': /* revision of following character set */
1972 if (!(c1
>= '@' && c1
<= '~'))
1973 goto label_invalid_code
;
1975 if (c1
!= ISO_CODE_ESC
)
1976 goto label_invalid_code
;
1978 goto label_escape_sequence
;
1980 case '$': /* designation of 2-byte character set */
1981 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
1982 goto label_invalid_code
;
1984 if (c1
>= '@' && c1
<= 'B')
1985 { /* designation of JISX0208.1978, GB2312.1980,
1987 DECODE_DESIGNATION (0, 2, 94, c1
);
1989 else if (c1
>= 0x28 && c1
<= 0x2B)
1990 { /* designation of DIMENSION2_CHARS94 character set */
1992 DECODE_DESIGNATION (c1
- 0x28, 2, 94, c2
);
1994 else if (c1
>= 0x2C && c1
<= 0x2F)
1995 { /* designation of DIMENSION2_CHARS96 character set */
1997 DECODE_DESIGNATION (c1
- 0x2C, 2, 96, c2
);
2000 goto label_invalid_code
;
2001 /* We must update these variables now. */
2002 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2003 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
2006 case 'n': /* invocation of locking-shift-2 */
2007 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
2008 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
2009 goto label_invalid_code
;
2010 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 2;
2011 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2014 case 'o': /* invocation of locking-shift-3 */
2015 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
2016 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
2017 goto label_invalid_code
;
2018 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 3;
2019 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2022 case 'N': /* invocation of single-shift-2 */
2023 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2024 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
2025 goto label_invalid_code
;
2026 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 2);
2028 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
2029 goto label_invalid_code
;
2032 case 'O': /* invocation of single-shift-3 */
2033 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2034 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
2035 goto label_invalid_code
;
2036 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 3);
2038 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
2039 goto label_invalid_code
;
2042 case '0': case '2': case '3': case '4': /* start composition */
2043 DECODE_COMPOSITION_START (c1
);
2046 case '1': /* end composition */
2047 DECODE_COMPOSITION_END (c1
);
2050 case '[': /* specification of direction */
2051 if (coding
->flags
& CODING_FLAG_ISO_NO_DIRECTION
)
2052 goto label_invalid_code
;
2053 /* For the moment, nested direction is not supported.
2054 So, `coding->mode & CODING_MODE_DIRECTION' zero means
2055 left-to-right, and nonzero means right-to-left. */
2059 case ']': /* end of the current direction */
2060 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2062 case '0': /* end of the current direction */
2063 case '1': /* start of left-to-right direction */
2066 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2068 goto label_invalid_code
;
2071 case '2': /* start of right-to-left direction */
2074 coding
->mode
|= CODING_MODE_DIRECTION
;
2076 goto label_invalid_code
;
2080 goto label_invalid_code
;
2085 if (COMPOSING_P (coding
))
2086 DECODE_COMPOSITION_END ('1');
2090 /* CTEXT extended segment:
2091 ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
2092 We keep these bytes as is for the moment.
2093 They may be decoded by post-read-conversion. */
2098 ONE_MORE_BYTE (dim
);
2101 size
= ((M
- 128) * 128) + (L
- 128);
2102 required
= 8 + size
* 2;
2103 if (dst
+ required
> (dst_bytes
? dst_end
: src
))
2104 goto label_end_of_loop
;
2105 *dst
++ = ISO_CODE_ESC
;
2110 dst
+= CHAR_STRING (M
, dst
), produced_chars
++;
2111 dst
+= CHAR_STRING (L
, dst
), produced_chars
++;
2115 dst
+= CHAR_STRING (c1
, dst
), produced_chars
++;
2117 coding
->produced_char
+= produced_chars
;
2121 unsigned char *d
= dst
;
2124 /* XFree86 extension for embedding UTF-8 in CTEXT:
2125 ESC % G --UTF-8-BYTES-- ESC % @
2126 We keep these bytes as is for the moment.
2127 They may be decoded by post-read-conversion. */
2128 if (d
+ 6 > (dst_bytes
? dst_end
: src
))
2129 goto label_end_of_loop
;
2130 *d
++ = ISO_CODE_ESC
;
2134 while (d
+ 1 < (dst_bytes
? dst_end
: src
))
2137 if (c1
== ISO_CODE_ESC
2138 && src
+ 1 < src_end
2145 d
+= CHAR_STRING (c1
, d
), produced_chars
++;
2147 if (d
+ 3 > (dst_bytes
? dst_end
: src
))
2148 goto label_end_of_loop
;
2149 *d
++ = ISO_CODE_ESC
;
2153 coding
->produced_char
+= produced_chars
+ 3;
2156 goto label_invalid_code
;
2160 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
2161 goto label_invalid_code
;
2162 if (c1
>= 0x28 && c1
<= 0x2B)
2163 { /* designation of DIMENSION1_CHARS94 character set */
2165 DECODE_DESIGNATION (c1
- 0x28, 1, 94, c2
);
2167 else if (c1
>= 0x2C && c1
<= 0x2F)
2168 { /* designation of DIMENSION1_CHARS96 character set */
2170 DECODE_DESIGNATION (c1
- 0x2C, 1, 96, c2
);
2173 goto label_invalid_code
;
2174 /* We must update these variables now. */
2175 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2176 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
2181 /* Now we know CHARSET and 1st position code C1 of a character.
2182 Produce a multibyte sequence for that character while getting
2183 2nd position code C2 if necessary. */
2184 if (CHARSET_DIMENSION (charset
) == 2)
2187 if (c1
< 0x80 ? c2
< 0x20 || c2
>= 0x80 : c2
< 0xA0)
2188 /* C2 is not in a valid range. */
2189 goto label_invalid_code
;
2191 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
2197 if (COMPOSING_P (coding
))
2198 DECODE_COMPOSITION_END ('1');
2205 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
2206 coding
->produced
= dst
- destination
;
2211 /* ISO2022 encoding stuff. */
2214 It is not enough to say just "ISO2022" on encoding, we have to
2215 specify more details. In Emacs, each ISO2022 coding system
2216 variant has the following specifications:
2217 1. Initial designation to G0 through G3.
2218 2. Allows short-form designation?
2219 3. ASCII should be designated to G0 before control characters?
2220 4. ASCII should be designated to G0 at end of line?
2221 5. 7-bit environment or 8-bit environment?
2222 6. Use locking-shift?
2223 7. Use Single-shift?
2224 And the following two are only for Japanese:
2225 8. Use ASCII in place of JIS0201-1976-Roman?
2226 9. Use JISX0208-1983 in place of JISX0208-1978?
2227 These specifications are encoded in `coding->flags' as flag bits
2228 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
2232 /* Produce codes (escape sequence) for designating CHARSET to graphic
2233 register REG at DST, and increment DST. If <final-char> of CHARSET is
2234 '@', 'A', or 'B' and the coding system CODING allows, produce
2235 designation sequence of short-form. */
2237 #define ENCODE_DESIGNATION(charset, reg, coding) \
2239 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
2240 char *intermediate_char_94 = "()*+"; \
2241 char *intermediate_char_96 = ",-./"; \
2242 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
2244 if (revision < 255) \
2246 *dst++ = ISO_CODE_ESC; \
2248 *dst++ = '@' + revision; \
2250 *dst++ = ISO_CODE_ESC; \
2251 if (CHARSET_DIMENSION (charset) == 1) \
2253 if (CHARSET_CHARS (charset) == 94) \
2254 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2256 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2261 if (CHARSET_CHARS (charset) == 94) \
2263 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
2265 || final_char < '@' || final_char > 'B') \
2266 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2269 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2271 *dst++ = final_char; \
2272 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
2275 /* The following two macros produce codes (control character or escape
2276 sequence) for ISO2022 single-shift functions (single-shift-2 and
2279 #define ENCODE_SINGLE_SHIFT_2 \
2281 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2282 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
2284 *dst++ = ISO_CODE_SS2; \
2285 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2288 #define ENCODE_SINGLE_SHIFT_3 \
2290 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2291 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
2293 *dst++ = ISO_CODE_SS3; \
2294 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2297 /* The following four macros produce codes (control character or
2298 escape sequence) for ISO2022 locking-shift functions (shift-in,
2299 shift-out, locking-shift-2, and locking-shift-3). */
2301 #define ENCODE_SHIFT_IN \
2303 *dst++ = ISO_CODE_SI; \
2304 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
2307 #define ENCODE_SHIFT_OUT \
2309 *dst++ = ISO_CODE_SO; \
2310 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
2313 #define ENCODE_LOCKING_SHIFT_2 \
2315 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
2316 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
2319 #define ENCODE_LOCKING_SHIFT_3 \
2321 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
2322 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
2325 /* Produce codes for a DIMENSION1 character whose character set is
2326 CHARSET and whose position-code is C1. Designation and invocation
2327 sequences are also produced in advance if necessary. */
2329 #define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
2331 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2333 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2334 *dst++ = c1 & 0x7F; \
2336 *dst++ = c1 | 0x80; \
2337 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2340 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2342 *dst++ = c1 & 0x7F; \
2345 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2347 *dst++ = c1 | 0x80; \
2351 /* Since CHARSET is not yet invoked to any graphic planes, we \
2352 must invoke it, or, at first, designate it to some graphic \
2353 register. Then repeat the loop to actually produce the \
2355 dst = encode_invocation_designation (charset, coding, dst); \
2358 /* Produce codes for a DIMENSION2 character whose character set is
2359 CHARSET and whose position-codes are C1 and C2. Designation and
2360 invocation codes are also produced in advance if necessary. */
2362 #define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
2364 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2366 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2367 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
2369 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
2370 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2373 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2375 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
2378 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2380 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
2384 /* Since CHARSET is not yet invoked to any graphic planes, we \
2385 must invoke it, or, at first, designate it to some graphic \
2386 register. Then repeat the loop to actually produce the \
2388 dst = encode_invocation_designation (charset, coding, dst); \
2391 #define ENCODE_ISO_CHARACTER(c) \
2393 int charset, c1, c2; \
2395 SPLIT_CHAR (c, charset, c1, c2); \
2396 if (CHARSET_DEFINED_P (charset)) \
2398 if (CHARSET_DIMENSION (charset) == 1) \
2400 if (charset == CHARSET_ASCII \
2401 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
2402 charset = charset_latin_jisx0201; \
2403 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
2407 if (charset == charset_jisx0208 \
2408 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
2409 charset = charset_jisx0208_1978; \
2410 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
2422 /* Instead of encoding character C, produce one or two `?'s. */
2424 #define ENCODE_UNSAFE_CHARACTER(c) \
2426 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2427 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
2428 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2432 /* Produce designation and invocation codes at a place pointed by DST
2433 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
2437 encode_invocation_designation (charset
, coding
, dst
)
2439 struct coding_system
*coding
;
2442 int reg
; /* graphic register number */
2444 /* At first, check designations. */
2445 for (reg
= 0; reg
< 4; reg
++)
2446 if (charset
== CODING_SPEC_ISO_DESIGNATION (coding
, reg
))
2451 /* CHARSET is not yet designated to any graphic registers. */
2452 /* At first check the requested designation. */
2453 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2454 if (reg
== CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
)
2455 /* Since CHARSET requests no special designation, designate it
2456 to graphic register 0. */
2459 ENCODE_DESIGNATION (charset
, reg
, coding
);
2462 if (CODING_SPEC_ISO_INVOCATION (coding
, 0) != reg
2463 && CODING_SPEC_ISO_INVOCATION (coding
, 1) != reg
)
2465 /* Since the graphic register REG is not invoked to any graphic
2466 planes, invoke it to graphic plane 0. */
2469 case 0: /* graphic register 0 */
2473 case 1: /* graphic register 1 */
2477 case 2: /* graphic register 2 */
2478 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2479 ENCODE_SINGLE_SHIFT_2
;
2481 ENCODE_LOCKING_SHIFT_2
;
2484 case 3: /* graphic register 3 */
2485 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2486 ENCODE_SINGLE_SHIFT_3
;
2488 ENCODE_LOCKING_SHIFT_3
;
2496 /* Produce 2-byte codes for encoded composition rule RULE. */
2498 #define ENCODE_COMPOSITION_RULE(rule) \
2501 COMPOSITION_DECODE_RULE (rule, gref, nref); \
2502 *dst++ = 32 + 81 + gref; \
2503 *dst++ = 32 + nref; \
2506 /* Produce codes for indicating the start of a composition sequence
2507 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
2508 which specify information about the composition. See the comment
2509 in coding.h for the format of DATA. */
2511 #define ENCODE_COMPOSITION_START(coding, data) \
2513 coding->composing = data[3]; \
2514 *dst++ = ISO_CODE_ESC; \
2515 if (coding->composing == COMPOSITION_RELATIVE) \
2519 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
2521 coding->cmp_data_index = coding->cmp_data_start + 4; \
2522 coding->composition_rule_follows = 0; \
2526 /* Produce codes for indicating the end of the current composition. */
2528 #define ENCODE_COMPOSITION_END(coding, data) \
2530 *dst++ = ISO_CODE_ESC; \
2532 coding->cmp_data_start += data[0]; \
2533 coding->composing = COMPOSITION_NO; \
2534 if (coding->cmp_data_start == coding->cmp_data->used \
2535 && coding->cmp_data->next) \
2537 coding->cmp_data = coding->cmp_data->next; \
2538 coding->cmp_data_start = 0; \
2542 /* Produce composition start sequence ESC 0. Here, this sequence
2543 doesn't mean the start of a new composition but means that we have
2544 just produced components (alternate chars and composition rules) of
2545 the composition and the actual text follows in SRC. */
2547 #define ENCODE_COMPOSITION_FAKE_START(coding) \
2549 *dst++ = ISO_CODE_ESC; \
2551 coding->composing = COMPOSITION_RELATIVE; \
2554 /* The following three macros produce codes for indicating direction
2556 #define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
2558 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
2559 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
2561 *dst++ = ISO_CODE_CSI; \
2564 #define ENCODE_DIRECTION_R2L \
2565 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
2567 #define ENCODE_DIRECTION_L2R \
2568 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
2570 /* Produce codes for designation and invocation to reset the graphic
2571 planes and registers to initial state. */
2572 #define ENCODE_RESET_PLANE_AND_REGISTER \
2575 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2577 for (reg = 0; reg < 4; reg++) \
2578 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2579 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2580 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2581 ENCODE_DESIGNATION \
2582 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
2585 /* Produce designation sequences of charsets in the line started from
2586 SRC to a place pointed by DST, and return updated DST.
2588 If the current block ends before any end-of-line, we may fail to
2589 find all the necessary designations. */
2591 static unsigned char *
2592 encode_designation_at_bol (coding
, translation_table
, src
, src_end
, dst
)
2593 struct coding_system
*coding
;
2594 Lisp_Object translation_table
;
2595 unsigned char *src
, *src_end
, *dst
;
2597 int charset
, c
, found
= 0, reg
;
2598 /* Table of charsets to be designated to each graphic register. */
2601 for (reg
= 0; reg
< 4; reg
++)
2610 charset
= CHAR_CHARSET (c
);
2611 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2612 if (reg
!= CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
&& r
[reg
] < 0)
2622 for (reg
= 0; reg
< 4; reg
++)
2624 && CODING_SPEC_ISO_DESIGNATION (coding
, reg
) != r
[reg
])
2625 ENCODE_DESIGNATION (r
[reg
], reg
, coding
);
2631 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2634 encode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
2635 struct coding_system
*coding
;
2636 unsigned char *source
, *destination
;
2637 int src_bytes
, dst_bytes
;
2639 unsigned char *src
= source
;
2640 unsigned char *src_end
= source
+ src_bytes
;
2641 unsigned char *dst
= destination
;
2642 unsigned char *dst_end
= destination
+ dst_bytes
;
2643 /* Since the maximum bytes produced by each loop is 20, we subtract 19
2644 from DST_END to assure overflow checking is necessary only at the
2646 unsigned char *adjusted_dst_end
= dst_end
- 19;
2647 /* SRC_BASE remembers the start position in source in each loop.
2648 The loop will be exited when there's not enough source text to
2649 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2650 there's not enough destination area to produce encoded codes
2651 (within macro EMIT_BYTES). */
2652 unsigned char *src_base
;
2654 Lisp_Object translation_table
;
2655 Lisp_Object safe_chars
;
2657 if (coding
->flags
& CODING_FLAG_ISO_SAFE
)
2658 coding
->mode
|= CODING_MODE_INHIBIT_UNENCODABLE_CHAR
;
2660 safe_chars
= coding_safe_chars (coding
->symbol
);
2662 if (NILP (Venable_character_translation
))
2663 translation_table
= Qnil
;
2666 translation_table
= coding
->translation_table_for_encode
;
2667 if (NILP (translation_table
))
2668 translation_table
= Vstandard_translation_table_for_encode
;
2671 coding
->consumed_char
= 0;
2677 if (dst
>= (dst_bytes
? adjusted_dst_end
: (src
- 19)))
2679 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
2683 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
2684 && CODING_SPEC_ISO_BOL (coding
))
2686 /* We have to produce designation sequences if any now. */
2687 dst
= encode_designation_at_bol (coding
, translation_table
,
2689 CODING_SPEC_ISO_BOL (coding
) = 0;
2692 /* Check composition start and end. */
2693 if (coding
->composing
!= COMPOSITION_DISABLED
2694 && coding
->cmp_data_start
< coding
->cmp_data
->used
)
2696 struct composition_data
*cmp_data
= coding
->cmp_data
;
2697 int *data
= cmp_data
->data
+ coding
->cmp_data_start
;
2698 int this_pos
= cmp_data
->char_offset
+ coding
->consumed_char
;
2700 if (coding
->composing
== COMPOSITION_RELATIVE
)
2702 if (this_pos
== data
[2])
2704 ENCODE_COMPOSITION_END (coding
, data
);
2705 cmp_data
= coding
->cmp_data
;
2706 data
= cmp_data
->data
+ coding
->cmp_data_start
;
2709 else if (COMPOSING_P (coding
))
2711 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2712 if (coding
->cmp_data_index
== coding
->cmp_data_start
+ data
[0])
2713 /* We have consumed components of the composition.
2714 What follows in SRC is the composition's base
2716 ENCODE_COMPOSITION_FAKE_START (coding
);
2719 int c
= cmp_data
->data
[coding
->cmp_data_index
++];
2720 if (coding
->composition_rule_follows
)
2722 ENCODE_COMPOSITION_RULE (c
);
2723 coding
->composition_rule_follows
= 0;
2727 if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
2728 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2729 ENCODE_UNSAFE_CHARACTER (c
);
2731 ENCODE_ISO_CHARACTER (c
);
2732 if (coding
->composing
== COMPOSITION_WITH_RULE_ALTCHARS
)
2733 coding
->composition_rule_follows
= 1;
2738 if (!COMPOSING_P (coding
))
2740 if (this_pos
== data
[1])
2742 ENCODE_COMPOSITION_START (coding
, data
);
2750 /* Now encode the character C. */
2751 if (c
< 0x20 || c
== 0x7F)
2755 if (! (coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
2757 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2758 ENCODE_RESET_PLANE_AND_REGISTER
;
2762 /* fall down to treat '\r' as '\n' ... */
2767 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_EOL
)
2768 ENCODE_RESET_PLANE_AND_REGISTER
;
2769 if (coding
->flags
& CODING_FLAG_ISO_INIT_AT_BOL
)
2770 bcopy (coding
->spec
.iso2022
.initial_designation
,
2771 coding
->spec
.iso2022
.current_designation
,
2772 sizeof coding
->spec
.iso2022
.initial_designation
);
2773 if (coding
->eol_type
== CODING_EOL_LF
2774 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
2775 *dst
++ = ISO_CODE_LF
;
2776 else if (coding
->eol_type
== CODING_EOL_CRLF
)
2777 *dst
++ = ISO_CODE_CR
, *dst
++ = ISO_CODE_LF
;
2779 *dst
++ = ISO_CODE_CR
;
2780 CODING_SPEC_ISO_BOL (coding
) = 1;
2784 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2785 ENCODE_RESET_PLANE_AND_REGISTER
;
2789 else if (ASCII_BYTE_P (c
))
2790 ENCODE_ISO_CHARACTER (c
);
2791 else if (SINGLE_BYTE_CHAR_P (c
))
2796 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
2797 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2798 ENCODE_UNSAFE_CHARACTER (c
);
2800 ENCODE_ISO_CHARACTER (c
);
2802 coding
->consumed_char
++;
2806 coding
->consumed
= src_base
- source
;
2807 coding
->produced
= coding
->produced_char
= dst
- destination
;
2811 /*** 4. SJIS and BIG5 handlers ***/
2813 /* Although SJIS and BIG5 are not ISO coding systems, they are used
2814 quite widely. So, for the moment, Emacs supports them in the bare
2815 C code. But, in the future, they may be supported only by CCL. */
2817 /* SJIS is a coding system encoding three character sets: ASCII, right
2818 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2819 as is. A character of charset katakana-jisx0201 is encoded by
2820 "position-code + 0x80". A character of charset japanese-jisx0208
2821 is encoded in 2-byte but two position-codes are divided and shifted
2822 so that it fits in the range below.
2824 --- CODE RANGE of SJIS ---
2825 (character set) (range)
2827 KATAKANA-JISX0201 0xA1 .. 0xDF
2828 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
2829 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
2830 -------------------------------
2834 /* BIG5 is a coding system encoding two character sets: ASCII and
2835 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2836 character set and is encoded in two bytes.
2838 --- CODE RANGE of BIG5 ---
2839 (character set) (range)
2841 Big5 (1st byte) 0xA1 .. 0xFE
2842 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2843 --------------------------
2845 Since the number of characters in Big5 is larger than maximum
2846 characters in Emacs' charset (96x96), it can't be handled as one
2847 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2848 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2849 contains frequently used characters and the latter contains less
2850 frequently used characters. */
2852 /* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2853 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2854 C1 and C2 are the 1st and 2nd position-codes of Emacs' internal
2855 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2857 /* Number of Big5 characters which have the same code in 1st byte. */
2858 #define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2860 #define DECODE_BIG5(b1, b2, charset, c1, c2) \
2863 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2865 charset = charset_big5_1; \
2868 charset = charset_big5_2; \
2869 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2871 c1 = temp / (0xFF - 0xA1) + 0x21; \
2872 c2 = temp % (0xFF - 0xA1) + 0x21; \
2875 #define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2877 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2878 if (charset == charset_big5_2) \
2879 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2880 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2881 b2 = temp % BIG5_SAME_ROW; \
2882 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2885 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2886 Check if a text is encoded in SJIS. If it is, return
2887 CODING_CATEGORY_MASK_SJIS, else return 0. */
2890 detect_coding_sjis (src
, src_end
, multibytep
)
2891 unsigned char *src
, *src_end
;
2895 /* Dummy for ONE_MORE_BYTE. */
2896 struct coding_system dummy_coding
;
2897 struct coding_system
*coding
= &dummy_coding
;
2901 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2904 if (c
== 0x80 || c
== 0xA0 || c
> 0xEF)
2906 if (c
<= 0x9F || c
>= 0xE0)
2908 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2909 if (c
< 0x40 || c
== 0x7F || c
> 0xFC)
2914 return CODING_CATEGORY_MASK_SJIS
;
2917 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2918 Check if a text is encoded in BIG5. If it is, return
2919 CODING_CATEGORY_MASK_BIG5, else return 0. */
2922 detect_coding_big5 (src
, src_end
, multibytep
)
2923 unsigned char *src
, *src_end
;
2927 /* Dummy for ONE_MORE_BYTE. */
2928 struct coding_system dummy_coding
;
2929 struct coding_system
*coding
= &dummy_coding
;
2933 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2936 if (c
< 0xA1 || c
> 0xFE)
2938 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2939 if (c
< 0x40 || (c
> 0x7F && c
< 0xA1) || c
> 0xFE)
2943 return CODING_CATEGORY_MASK_BIG5
;
2946 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2947 Check if a text is encoded in UTF-8. If it is, return
2948 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2950 #define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2951 #define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2952 #define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2953 #define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2954 #define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2955 #define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2956 #define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2959 detect_coding_utf_8 (src
, src_end
, multibytep
)
2960 unsigned char *src
, *src_end
;
2964 int seq_maybe_bytes
;
2965 /* Dummy for ONE_MORE_BYTE. */
2966 struct coding_system dummy_coding
;
2967 struct coding_system
*coding
= &dummy_coding
;
2971 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2972 if (UTF_8_1_OCTET_P (c
))
2974 else if (UTF_8_2_OCTET_LEADING_P (c
))
2975 seq_maybe_bytes
= 1;
2976 else if (UTF_8_3_OCTET_LEADING_P (c
))
2977 seq_maybe_bytes
= 2;
2978 else if (UTF_8_4_OCTET_LEADING_P (c
))
2979 seq_maybe_bytes
= 3;
2980 else if (UTF_8_5_OCTET_LEADING_P (c
))
2981 seq_maybe_bytes
= 4;
2982 else if (UTF_8_6_OCTET_LEADING_P (c
))
2983 seq_maybe_bytes
= 5;
2989 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2990 if (!UTF_8_EXTRA_OCTET_P (c
))
2994 while (seq_maybe_bytes
> 0);
2998 return CODING_CATEGORY_MASK_UTF_8
;
3001 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3002 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
3003 Little Endian (otherwise). If it is, return
3004 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
3007 #define UTF_16_INVALID_P(val) \
3008 (((val) == 0xFFFE) \
3009 || ((val) == 0xFFFF))
3011 #define UTF_16_HIGH_SURROGATE_P(val) \
3012 (((val) & 0xD800) == 0xD800)
3014 #define UTF_16_LOW_SURROGATE_P(val) \
3015 (((val) & 0xDC00) == 0xDC00)
3018 detect_coding_utf_16 (src
, src_end
, multibytep
)
3019 unsigned char *src
, *src_end
;
3022 unsigned char c1
, c2
;
3023 /* Dummy for ONE_MORE_BYTE_CHECK_MULTIBYTE. */
3024 struct coding_system dummy_coding
;
3025 struct coding_system
*coding
= &dummy_coding
;
3027 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
3028 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2
, multibytep
);
3030 if ((c1
== 0xFF) && (c2
== 0xFE))
3031 return CODING_CATEGORY_MASK_UTF_16_LE
;
3032 else if ((c1
== 0xFE) && (c2
== 0xFF))
3033 return CODING_CATEGORY_MASK_UTF_16_BE
;
3039 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
3040 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
3043 decode_coding_sjis_big5 (coding
, source
, destination
,
3044 src_bytes
, dst_bytes
, sjis_p
)
3045 struct coding_system
*coding
;
3046 unsigned char *source
, *destination
;
3047 int src_bytes
, dst_bytes
;
3050 unsigned char *src
= source
;
3051 unsigned char *src_end
= source
+ src_bytes
;
3052 unsigned char *dst
= destination
;
3053 unsigned char *dst_end
= destination
+ dst_bytes
;
3054 /* SRC_BASE remembers the start position in source in each loop.
3055 The loop will be exited when there's not enough source code
3056 (within macro ONE_MORE_BYTE), or when there's not enough
3057 destination area to produce a character (within macro
3059 unsigned char *src_base
;
3060 Lisp_Object translation_table
;
3062 if (NILP (Venable_character_translation
))
3063 translation_table
= Qnil
;
3066 translation_table
= coding
->translation_table_for_decode
;
3067 if (NILP (translation_table
))
3068 translation_table
= Vstandard_translation_table_for_decode
;
3071 coding
->produced_char
= 0;
3074 int c
, charset
, c1
, c2
= 0;
3081 charset
= CHARSET_ASCII
;
3086 if (coding
->eol_type
== CODING_EOL_CRLF
)
3092 /* To process C2 again, SRC is subtracted by 1. */
3095 else if (coding
->eol_type
== CODING_EOL_CR
)
3099 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3100 && (coding
->eol_type
== CODING_EOL_CR
3101 || coding
->eol_type
== CODING_EOL_CRLF
))
3103 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3104 goto label_end_of_loop
;
3112 if (c1
== 0x80 || c1
== 0xA0 || c1
> 0xEF)
3113 goto label_invalid_code
;
3114 if (c1
<= 0x9F || c1
>= 0xE0)
3116 /* SJIS -> JISX0208 */
3118 if (c2
< 0x40 || c2
== 0x7F || c2
> 0xFC)
3119 goto label_invalid_code
;
3120 DECODE_SJIS (c1
, c2
, c1
, c2
);
3121 charset
= charset_jisx0208
;
3124 /* SJIS -> JISX0201-Kana */
3125 charset
= charset_katakana_jisx0201
;
3130 if (c1
< 0xA0 || c1
> 0xFE)
3131 goto label_invalid_code
;
3133 if (c2
< 0x40 || (c2
> 0x7E && c2
< 0xA1) || c2
> 0xFE)
3134 goto label_invalid_code
;
3135 DECODE_BIG5 (c1
, c2
, charset
, c1
, c2
);
3139 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
3151 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3152 coding
->produced
= dst
- destination
;
3156 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
3157 This function can encode charsets `ascii', `katakana-jisx0201',
3158 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
3159 are sure that all these charsets are registered as official charset
3160 (i.e. do not have extended leading-codes). Characters of other
3161 charsets are produced without any encoding. If SJIS_P is 1, encode
3162 SJIS text, else encode BIG5 text. */
3165 encode_coding_sjis_big5 (coding
, source
, destination
,
3166 src_bytes
, dst_bytes
, sjis_p
)
3167 struct coding_system
*coding
;
3168 unsigned char *source
, *destination
;
3169 int src_bytes
, dst_bytes
;
3172 unsigned char *src
= source
;
3173 unsigned char *src_end
= source
+ src_bytes
;
3174 unsigned char *dst
= destination
;
3175 unsigned char *dst_end
= destination
+ dst_bytes
;
3176 /* SRC_BASE remembers the start position in source in each loop.
3177 The loop will be exited when there's not enough source text to
3178 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3179 there's not enough destination area to produce encoded codes
3180 (within macro EMIT_BYTES). */
3181 unsigned char *src_base
;
3182 Lisp_Object translation_table
;
3184 if (NILP (Venable_character_translation
))
3185 translation_table
= Qnil
;
3188 translation_table
= coding
->translation_table_for_encode
;
3189 if (NILP (translation_table
))
3190 translation_table
= Vstandard_translation_table_for_encode
;
3195 int c
, charset
, c1
, c2
;
3200 /* Now encode the character C. */
3201 if (SINGLE_BYTE_CHAR_P (c
))
3206 if (!(coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
3213 if (coding
->eol_type
== CODING_EOL_CRLF
)
3215 EMIT_TWO_BYTES ('\r', c
);
3218 else if (coding
->eol_type
== CODING_EOL_CR
)
3226 SPLIT_CHAR (c
, charset
, c1
, c2
);
3229 if (charset
== charset_jisx0208
3230 || charset
== charset_jisx0208_1978
)
3232 ENCODE_SJIS (c1
, c2
, c1
, c2
);
3233 EMIT_TWO_BYTES (c1
, c2
);
3235 else if (charset
== charset_katakana_jisx0201
)
3236 EMIT_ONE_BYTE (c1
| 0x80);
3237 else if (charset
== charset_latin_jisx0201
)
3239 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
)
3241 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3242 if (CHARSET_WIDTH (charset
) > 1)
3243 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3246 /* There's no way other than producing the internal
3248 EMIT_BYTES (src_base
, src
);
3252 if (charset
== charset_big5_1
|| charset
== charset_big5_2
)
3254 ENCODE_BIG5 (charset
, c1
, c2
, c1
, c2
);
3255 EMIT_TWO_BYTES (c1
, c2
);
3257 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
)
3259 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3260 if (CHARSET_WIDTH (charset
) > 1)
3261 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3264 /* There's no way other than producing the internal
3266 EMIT_BYTES (src_base
, src
);
3269 coding
->consumed_char
++;
3273 coding
->consumed
= src_base
- source
;
3274 coding
->produced
= coding
->produced_char
= dst
- destination
;
3278 /*** 5. CCL handlers ***/
3280 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3281 Check if a text is encoded in a coding system of which
3282 encoder/decoder are written in CCL program. If it is, return
3283 CODING_CATEGORY_MASK_CCL, else return 0. */
3286 detect_coding_ccl (src
, src_end
, multibytep
)
3287 unsigned char *src
, *src_end
;
3290 unsigned char *valid
;
3292 /* Dummy for ONE_MORE_BYTE. */
3293 struct coding_system dummy_coding
;
3294 struct coding_system
*coding
= &dummy_coding
;
3296 /* No coding system is assigned to coding-category-ccl. */
3297 if (!coding_system_table
[CODING_CATEGORY_IDX_CCL
])
3300 valid
= coding_system_table
[CODING_CATEGORY_IDX_CCL
]->spec
.ccl
.valid_codes
;
3303 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
3308 return CODING_CATEGORY_MASK_CCL
;
3312 /*** 6. End-of-line handlers ***/
3314 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
3317 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3318 struct coding_system
*coding
;
3319 unsigned char *source
, *destination
;
3320 int src_bytes
, dst_bytes
;
3322 unsigned char *src
= source
;
3323 unsigned char *dst
= destination
;
3324 unsigned char *src_end
= src
+ src_bytes
;
3325 unsigned char *dst_end
= dst
+ dst_bytes
;
3326 Lisp_Object translation_table
;
3327 /* SRC_BASE remembers the start position in source in each loop.
3328 The loop will be exited when there's not enough source code
3329 (within macro ONE_MORE_BYTE), or when there's not enough
3330 destination area to produce a character (within macro
3332 unsigned char *src_base
;
3335 translation_table
= Qnil
;
3336 switch (coding
->eol_type
)
3338 case CODING_EOL_CRLF
:
3353 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
))
3355 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3356 goto label_end_of_loop
;
3369 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3371 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3372 goto label_end_of_loop
;
3381 default: /* no need for EOL handling */
3391 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3392 coding
->produced
= dst
- destination
;
3396 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
3397 format of end-of-line according to `coding->eol_type'. It also
3398 convert multibyte form 8-bit characters to unibyte if
3399 CODING->src_multibyte is nonzero. If `coding->mode &
3400 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
3401 also means end-of-line. */
3404 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3405 struct coding_system
*coding
;
3406 const unsigned char *source
;
3407 unsigned char *destination
;
3408 int src_bytes
, dst_bytes
;
3410 const unsigned char *src
= source
;
3411 unsigned char *dst
= destination
;
3412 const unsigned char *src_end
= src
+ src_bytes
;
3413 unsigned char *dst_end
= dst
+ dst_bytes
;
3414 Lisp_Object translation_table
;
3415 /* SRC_BASE remembers the start position in source in each loop.
3416 The loop will be exited when there's not enough source text to
3417 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3418 there's not enough destination area to produce encoded codes
3419 (within macro EMIT_BYTES). */
3420 const unsigned char *src_base
;
3423 int selective_display
= coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
;
3425 translation_table
= Qnil
;
3426 if (coding
->src_multibyte
3427 && *(src_end
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3431 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
3434 if (coding
->eol_type
== CODING_EOL_CRLF
)
3436 while (src
< src_end
)
3442 else if (c
== '\n' || (c
== '\r' && selective_display
))
3443 EMIT_TWO_BYTES ('\r', '\n');
3453 if (!dst_bytes
|| src_bytes
<= dst_bytes
)
3455 safe_bcopy (src
, dst
, src_bytes
);
3461 if (coding
->src_multibyte
3462 && *(src
+ dst_bytes
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3464 safe_bcopy (src
, dst
, dst_bytes
);
3465 src_base
= src
+ dst_bytes
;
3466 dst
= destination
+ dst_bytes
;
3467 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
3469 if (coding
->eol_type
== CODING_EOL_CR
)
3471 for (tmp
= destination
; tmp
< dst
; tmp
++)
3472 if (*tmp
== '\n') *tmp
= '\r';
3474 else if (selective_display
)
3476 for (tmp
= destination
; tmp
< dst
; tmp
++)
3477 if (*tmp
== '\r') *tmp
= '\n';
3480 if (coding
->src_multibyte
)
3481 dst
= destination
+ str_as_unibyte (destination
, dst
- destination
);
3483 coding
->consumed
= src_base
- source
;
3484 coding
->produced
= dst
- destination
;
3485 coding
->produced_char
= coding
->produced
;
3489 /*** 7. C library functions ***/
3491 /* In Emacs Lisp, a coding system is represented by a Lisp symbol which
3492 has a property `coding-system'. The value of this property is a
3493 vector of length 5 (called the coding-vector). Among elements of
3494 this vector, the first (element[0]) and the fifth (element[4])
3495 carry important information for decoding/encoding. Before
3496 decoding/encoding, this information should be set in fields of a
3497 structure of type `coding_system'.
3499 The value of the property `coding-system' can be a symbol of another
3500 subsidiary coding-system. In that case, Emacs gets coding-vector
3503 `element[0]' contains information to be set in `coding->type'. The
3504 value and its meaning is as follows:
3506 0 -- coding_type_emacs_mule
3507 1 -- coding_type_sjis
3508 2 -- coding_type_iso2022
3509 3 -- coding_type_big5
3510 4 -- coding_type_ccl encoder/decoder written in CCL
3511 nil -- coding_type_no_conversion
3512 t -- coding_type_undecided (automatic conversion on decoding,
3513 no-conversion on encoding)
3515 `element[4]' contains information to be set in `coding->flags' and
3516 `coding->spec'. The meaning varies by `coding->type'.
3518 If `coding->type' is `coding_type_iso2022', element[4] is a vector
3519 of length 32 (of which the first 13 sub-elements are used now).
3520 Meanings of these sub-elements are:
3522 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
3523 If the value is an integer of valid charset, the charset is
3524 assumed to be designated to graphic register N initially.
3526 If the value is minus, it is a minus value of charset which
3527 reserves graphic register N, which means that the charset is
3528 not designated initially but should be designated to graphic
3529 register N just before encoding a character in that charset.
3531 If the value is nil, graphic register N is never used on
3534 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
3535 Each value takes t or nil. See the section ISO2022 of
3536 `coding.h' for more information.
3538 If `coding->type' is `coding_type_big5', element[4] is t to denote
3539 BIG5-ETen or nil to denote BIG5-HKU.
3541 If `coding->type' takes the other value, element[4] is ignored.
3543 Emacs Lisp's coding systems also carry information about format of
3544 end-of-line in a value of property `eol-type'. If the value is
3545 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3546 means CODING_EOL_CR. If it is not integer, it should be a vector
3547 of subsidiary coding systems of which property `eol-type' has one
3548 of the above values.
3552 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3553 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3554 is setup so that no conversion is necessary and return -1, else
3558 setup_coding_system (coding_system
, coding
)
3559 Lisp_Object coding_system
;
3560 struct coding_system
*coding
;
3562 Lisp_Object coding_spec
, coding_type
, eol_type
, plist
;
3565 /* At first, zero clear all members. */
3566 bzero (coding
, sizeof (struct coding_system
));
3568 /* Initialize some fields required for all kinds of coding systems. */
3569 coding
->symbol
= coding_system
;
3570 coding
->heading_ascii
= -1;
3571 coding
->post_read_conversion
= coding
->pre_write_conversion
= Qnil
;
3572 coding
->composing
= COMPOSITION_DISABLED
;
3573 coding
->cmp_data
= NULL
;
3575 if (NILP (coding_system
))
3576 goto label_invalid_coding_system
;
3578 coding_spec
= Fget (coding_system
, Qcoding_system
);
3580 if (!VECTORP (coding_spec
)
3581 || XVECTOR (coding_spec
)->size
!= 5
3582 || !CONSP (XVECTOR (coding_spec
)->contents
[3]))
3583 goto label_invalid_coding_system
;
3585 eol_type
= inhibit_eol_conversion
? Qnil
: Fget (coding_system
, Qeol_type
);
3586 if (VECTORP (eol_type
))
3588 coding
->eol_type
= CODING_EOL_UNDECIDED
;
3589 coding
->common_flags
= CODING_REQUIRE_DETECTION_MASK
;
3591 else if (XFASTINT (eol_type
) == 1)
3593 coding
->eol_type
= CODING_EOL_CRLF
;
3594 coding
->common_flags
3595 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3597 else if (XFASTINT (eol_type
) == 2)
3599 coding
->eol_type
= CODING_EOL_CR
;
3600 coding
->common_flags
3601 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3604 coding
->eol_type
= CODING_EOL_LF
;
3606 coding_type
= XVECTOR (coding_spec
)->contents
[0];
3607 /* Try short cut. */
3608 if (SYMBOLP (coding_type
))
3610 if (EQ (coding_type
, Qt
))
3612 coding
->type
= coding_type_undecided
;
3613 coding
->common_flags
|= CODING_REQUIRE_DETECTION_MASK
;
3616 coding
->type
= coding_type_no_conversion
;
3617 /* Initialize this member. Any thing other than
3618 CODING_CATEGORY_IDX_UTF_16_BE and
3619 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3620 special treatment in detect_eol. */
3621 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
3626 /* Get values of coding system properties:
3627 `post-read-conversion', `pre-write-conversion',
3628 `translation-table-for-decode', `translation-table-for-encode'. */
3629 plist
= XVECTOR (coding_spec
)->contents
[3];
3630 /* Pre & post conversion functions should be disabled if
3631 inhibit_eol_conversion is nonzero. This is the case that a code
3632 conversion function is called while those functions are running. */
3633 if (! inhibit_pre_post_conversion
)
3635 coding
->post_read_conversion
= Fplist_get (plist
, Qpost_read_conversion
);
3636 coding
->pre_write_conversion
= Fplist_get (plist
, Qpre_write_conversion
);
3638 val
= Fplist_get (plist
, Qtranslation_table_for_decode
);
3640 val
= Fget (val
, Qtranslation_table_for_decode
);
3641 coding
->translation_table_for_decode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3642 val
= Fplist_get (plist
, Qtranslation_table_for_encode
);
3644 val
= Fget (val
, Qtranslation_table_for_encode
);
3645 coding
->translation_table_for_encode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3646 val
= Fplist_get (plist
, Qcoding_category
);
3649 val
= Fget (val
, Qcoding_category_index
);
3651 coding
->category_idx
= XINT (val
);
3653 goto label_invalid_coding_system
;
3656 goto label_invalid_coding_system
;
3658 /* If the coding system has non-nil `composition' property, enable
3659 composition handling. */
3660 val
= Fplist_get (plist
, Qcomposition
);
3662 coding
->composing
= COMPOSITION_NO
;
3664 switch (XFASTINT (coding_type
))
3667 coding
->type
= coding_type_emacs_mule
;
3668 coding
->common_flags
3669 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3670 if (!NILP (coding
->post_read_conversion
))
3671 coding
->common_flags
|= CODING_REQUIRE_DECODING_MASK
;
3672 if (!NILP (coding
->pre_write_conversion
))
3673 coding
->common_flags
|= CODING_REQUIRE_ENCODING_MASK
;
3677 coding
->type
= coding_type_sjis
;
3678 coding
->common_flags
3679 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3683 coding
->type
= coding_type_iso2022
;
3684 coding
->common_flags
3685 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3687 Lisp_Object val
, temp
;
3689 int i
, charset
, reg_bits
= 0;
3691 val
= XVECTOR (coding_spec
)->contents
[4];
3693 if (!VECTORP (val
) || XVECTOR (val
)->size
!= 32)
3694 goto label_invalid_coding_system
;
3696 flags
= XVECTOR (val
)->contents
;
3698 = ((NILP (flags
[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM
)
3699 | (NILP (flags
[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL
)
3700 | (NILP (flags
[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL
)
3701 | (NILP (flags
[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS
)
3702 | (NILP (flags
[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT
)
3703 | (NILP (flags
[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT
)
3704 | (NILP (flags
[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN
)
3705 | (NILP (flags
[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS
)
3706 | (NILP (flags
[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION
)
3707 | (NILP (flags
[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL
)
3708 | (NILP (flags
[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
3709 | (NILP (flags
[15]) ? 0 : CODING_FLAG_ISO_SAFE
)
3710 | (NILP (flags
[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA
)
3713 /* Invoke graphic register 0 to plane 0. */
3714 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
3715 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3716 CODING_SPEC_ISO_INVOCATION (coding
, 1)
3717 = (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
? -1 : 1);
3718 /* Not single shifting at first. */
3719 CODING_SPEC_ISO_SINGLE_SHIFTING (coding
) = 0;
3720 /* Beginning of buffer should also be regarded as bol. */
3721 CODING_SPEC_ISO_BOL (coding
) = 1;
3723 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3724 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = 255;
3725 val
= Vcharset_revision_alist
;
3728 charset
= get_charset_id (Fcar_safe (XCAR (val
)));
3730 && (temp
= Fcdr_safe (XCAR (val
)), INTEGERP (temp
))
3731 && (i
= XINT (temp
), (i
>= 0 && (i
+ '@') < 128)))
3732 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = i
;
3736 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3737 FLAGS[REG] can be one of below:
3738 integer CHARSET: CHARSET occupies register I,
3739 t: designate nothing to REG initially, but can be used
3741 list of integer, nil, or t: designate the first
3742 element (if integer) to REG initially, the remaining
3743 elements (if integer) is designated to REG on request,
3744 if an element is t, REG can be used by any charsets,
3745 nil: REG is never used. */
3746 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3747 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3748 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
;
3749 for (i
= 0; i
< 4; i
++)
3751 if ((INTEGERP (flags
[i
])
3752 && (charset
= XINT (flags
[i
]), CHARSET_VALID_P (charset
)))
3753 || (charset
= get_charset_id (flags
[i
])) >= 0)
3755 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3756 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) = i
;
3758 else if (EQ (flags
[i
], Qt
))
3760 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3762 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3764 else if (CONSP (flags
[i
]))
3769 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3770 if ((INTEGERP (XCAR (tail
))
3771 && (charset
= XINT (XCAR (tail
)),
3772 CHARSET_VALID_P (charset
)))
3773 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3775 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3776 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) =i
;
3779 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3781 while (CONSP (tail
))
3783 if ((INTEGERP (XCAR (tail
))
3784 && (charset
= XINT (XCAR (tail
)),
3785 CHARSET_VALID_P (charset
)))
3786 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3787 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3789 else if (EQ (XCAR (tail
), Qt
))
3795 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3797 CODING_SPEC_ISO_DESIGNATION (coding
, i
)
3798 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
);
3801 if (reg_bits
&& ! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
3803 /* REG 1 can be used only by locking shift in 7-bit env. */
3804 if (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
3806 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
3807 /* Without any shifting, only REG 0 and 1 can be used. */
3812 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3814 if (CHARSET_DEFINED_P (charset
)
3815 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3816 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
))
3818 /* There exist some default graphic registers to be
3821 /* We had better avoid designating a charset of
3822 CHARS96 to REG 0 as far as possible. */
3823 if (CHARSET_CHARS (charset
) == 96)
3824 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3826 ? 1 : (reg_bits
& 4 ? 2 : (reg_bits
& 8 ? 3 : 0)));
3828 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3830 ? 0 : (reg_bits
& 2 ? 1 : (reg_bits
& 4 ? 2 : 3)));
3834 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3835 coding
->spec
.iso2022
.last_invalid_designation_register
= -1;
3839 coding
->type
= coding_type_big5
;
3840 coding
->common_flags
3841 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3843 = (NILP (XVECTOR (coding_spec
)->contents
[4])
3844 ? CODING_FLAG_BIG5_HKU
3845 : CODING_FLAG_BIG5_ETEN
);
3849 coding
->type
= coding_type_ccl
;
3850 coding
->common_flags
3851 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3853 val
= XVECTOR (coding_spec
)->contents
[4];
3855 || setup_ccl_program (&(coding
->spec
.ccl
.decoder
),
3857 || setup_ccl_program (&(coding
->spec
.ccl
.encoder
),
3859 goto label_invalid_coding_system
;
3861 bzero (coding
->spec
.ccl
.valid_codes
, 256);
3862 val
= Fplist_get (plist
, Qvalid_codes
);
3867 for (; CONSP (val
); val
= XCDR (val
))
3871 && XINT (this) >= 0 && XINT (this) < 256)
3872 coding
->spec
.ccl
.valid_codes
[XINT (this)] = 1;
3873 else if (CONSP (this)
3874 && INTEGERP (XCAR (this))
3875 && INTEGERP (XCDR (this)))
3877 int start
= XINT (XCAR (this));
3878 int end
= XINT (XCDR (this));
3880 if (start
>= 0 && start
<= end
&& end
< 256)
3881 while (start
<= end
)
3882 coding
->spec
.ccl
.valid_codes
[start
++] = 1;
3887 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3888 coding
->spec
.ccl
.cr_carryover
= 0;
3889 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
3893 coding
->type
= coding_type_raw_text
;
3897 goto label_invalid_coding_system
;
3901 label_invalid_coding_system
:
3902 coding
->type
= coding_type_no_conversion
;
3903 coding
->category_idx
= CODING_CATEGORY_IDX_BINARY
;
3904 coding
->common_flags
= 0;
3905 coding
->eol_type
= CODING_EOL_LF
;
3906 coding
->pre_write_conversion
= coding
->post_read_conversion
= Qnil
;
3910 /* Free memory blocks allocated for storing composition information. */
3913 coding_free_composition_data (coding
)
3914 struct coding_system
*coding
;
3916 struct composition_data
*cmp_data
= coding
->cmp_data
, *next
;
3920 /* Memory blocks are chained. At first, rewind to the first, then,
3921 free blocks one by one. */
3922 while (cmp_data
->prev
)
3923 cmp_data
= cmp_data
->prev
;
3926 next
= cmp_data
->next
;
3930 coding
->cmp_data
= NULL
;
3933 /* Set `char_offset' member of all memory blocks pointed by
3934 coding->cmp_data to POS. */
3937 coding_adjust_composition_offset (coding
, pos
)
3938 struct coding_system
*coding
;
3941 struct composition_data
*cmp_data
;
3943 for (cmp_data
= coding
->cmp_data
; cmp_data
; cmp_data
= cmp_data
->next
)
3944 cmp_data
->char_offset
= pos
;
3947 /* Setup raw-text or one of its subsidiaries in the structure
3948 coding_system CODING according to the already setup value eol_type
3949 in CODING. CODING should be setup for some coding system in
3953 setup_raw_text_coding_system (coding
)
3954 struct coding_system
*coding
;
3956 if (coding
->type
!= coding_type_raw_text
)
3958 coding
->symbol
= Qraw_text
;
3959 coding
->type
= coding_type_raw_text
;
3960 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
3962 Lisp_Object subsidiaries
;
3963 subsidiaries
= Fget (Qraw_text
, Qeol_type
);
3965 if (VECTORP (subsidiaries
)
3966 && XVECTOR (subsidiaries
)->size
== 3)
3968 = XVECTOR (subsidiaries
)->contents
[coding
->eol_type
];
3970 setup_coding_system (coding
->symbol
, coding
);
3975 /* Emacs has a mechanism to automatically detect a coding system if it
3976 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3977 it's impossible to distinguish some coding systems accurately
3978 because they use the same range of codes. So, at first, coding
3979 systems are categorized into 7, those are:
3981 o coding-category-emacs-mule
3983 The category for a coding system which has the same code range
3984 as Emacs' internal format. Assigned the coding-system (Lisp
3985 symbol) `emacs-mule' by default.
3987 o coding-category-sjis
3989 The category for a coding system which has the same code range
3990 as SJIS. Assigned the coding-system (Lisp
3991 symbol) `japanese-shift-jis' by default.
3993 o coding-category-iso-7
3995 The category for a coding system which has the same code range
3996 as ISO2022 of 7-bit environment. This doesn't use any locking
3997 shift and single shift functions. This can encode/decode all
3998 charsets. Assigned the coding-system (Lisp symbol)
3999 `iso-2022-7bit' by default.
4001 o coding-category-iso-7-tight
4003 Same as coding-category-iso-7 except that this can
4004 encode/decode only the specified charsets.
4006 o coding-category-iso-8-1
4008 The category for a coding system which has the same code range
4009 as ISO2022 of 8-bit environment and graphic plane 1 used only
4010 for DIMENSION1 charset. This doesn't use any locking shift
4011 and single shift functions. Assigned the coding-system (Lisp
4012 symbol) `iso-latin-1' by default.
4014 o coding-category-iso-8-2
4016 The category for a coding system which has the same code range
4017 as ISO2022 of 8-bit environment and graphic plane 1 used only
4018 for DIMENSION2 charset. This doesn't use any locking shift
4019 and single shift functions. Assigned the coding-system (Lisp
4020 symbol) `japanese-iso-8bit' by default.
4022 o coding-category-iso-7-else
4024 The category for a coding system which has the same code range
4025 as ISO2022 of 7-bit environment but uses locking shift or
4026 single shift functions. Assigned the coding-system (Lisp
4027 symbol) `iso-2022-7bit-lock' by default.
4029 o coding-category-iso-8-else
4031 The category for a coding system which has the same code range
4032 as ISO2022 of 8-bit environment but uses locking shift or
4033 single shift functions. Assigned the coding-system (Lisp
4034 symbol) `iso-2022-8bit-ss2' by default.
4036 o coding-category-big5
4038 The category for a coding system which has the same code range
4039 as BIG5. Assigned the coding-system (Lisp symbol)
4040 `cn-big5' by default.
4042 o coding-category-utf-8
4044 The category for a coding system which has the same code range
4045 as UTF-8 (cf. RFC3629). Assigned the coding-system (Lisp
4046 symbol) `utf-8' by default.
4048 o coding-category-utf-16-be
4050 The category for a coding system in which a text has an
4051 Unicode signature (cf. Unicode Standard) in the order of BIG
4052 endian at the head. Assigned the coding-system (Lisp symbol)
4053 `utf-16-be' by default.
4055 o coding-category-utf-16-le
4057 The category for a coding system in which a text has an
4058 Unicode signature (cf. Unicode Standard) in the order of
4059 LITTLE endian at the head. Assigned the coding-system (Lisp
4060 symbol) `utf-16-le' by default.
4062 o coding-category-ccl
4064 The category for a coding system of which encoder/decoder is
4065 written in CCL programs. The default value is nil, i.e., no
4066 coding system is assigned.
4068 o coding-category-binary
4070 The category for a coding system not categorized in any of the
4071 above. Assigned the coding-system (Lisp symbol)
4072 `no-conversion' by default.
4074 Each of them is a Lisp symbol and the value is an actual
4075 `coding-system' (this is also a Lisp symbol) assigned by a user.
4076 What Emacs does actually is to detect a category of coding system.
4077 Then, it uses a `coding-system' assigned to it. If Emacs can't
4078 decide a single possible category, it selects a category of the
4079 highest priority. Priorities of categories are also specified by a
4080 user in a Lisp variable `coding-category-list'.
4085 int ascii_skip_code
[256];
4087 /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4088 If it detects possible coding systems, return an integer in which
4089 appropriate flag bits are set. Flag bits are defined by macros
4090 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
4091 it should point the table `coding_priorities'. In that case, only
4092 the flag bit for a coding system of the highest priority is set in
4093 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
4094 range 0x80..0x9F are in multibyte form.
4096 How many ASCII characters are at the head is returned as *SKIP. */
4099 detect_coding_mask (source
, src_bytes
, priorities
, skip
, multibytep
)
4100 unsigned char *source
;
4101 int src_bytes
, *priorities
, *skip
;
4104 register unsigned char c
;
4105 unsigned char *src
= source
, *src_end
= source
+ src_bytes
;
4106 unsigned int mask
, utf16_examined_p
, iso2022_examined_p
;
4109 /* At first, skip all ASCII characters and control characters except
4110 for three ISO2022 specific control characters. */
4111 ascii_skip_code
[ISO_CODE_SO
] = 0;
4112 ascii_skip_code
[ISO_CODE_SI
] = 0;
4113 ascii_skip_code
[ISO_CODE_ESC
] = 0;
4115 label_loop_detect_coding
:
4116 while (src
< src_end
&& ascii_skip_code
[*src
]) src
++;
4117 *skip
= src
- source
;
4120 /* We found nothing other than ASCII. There's nothing to do. */
4124 /* The text seems to be encoded in some multilingual coding system.
4125 Now, try to find in which coding system the text is encoded. */
4128 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
4129 /* C is an ISO2022 specific control code of C0. */
4130 mask
= detect_coding_iso2022 (src
, src_end
, multibytep
);
4133 /* No valid ISO2022 code follows C. Try again. */
4135 if (c
== ISO_CODE_ESC
)
4136 ascii_skip_code
[ISO_CODE_ESC
] = 1;
4138 ascii_skip_code
[ISO_CODE_SO
] = ascii_skip_code
[ISO_CODE_SI
] = 1;
4139 goto label_loop_detect_coding
;
4143 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4145 if (mask
& priorities
[i
])
4146 return priorities
[i
];
4148 return CODING_CATEGORY_MASK_RAW_TEXT
;
4155 if (multibytep
&& c
== LEADING_CODE_8_BIT_CONTROL
)
4160 /* C is the first byte of SJIS character code,
4161 or a leading-code of Emacs' internal format (emacs-mule),
4162 or the first byte of UTF-16. */
4163 try = (CODING_CATEGORY_MASK_SJIS
4164 | CODING_CATEGORY_MASK_EMACS_MULE
4165 | CODING_CATEGORY_MASK_UTF_16_BE
4166 | CODING_CATEGORY_MASK_UTF_16_LE
);
4168 /* Or, if C is a special latin extra code,
4169 or is an ISO2022 specific control code of C1 (SS2 or SS3),
4170 or is an ISO2022 control-sequence-introducer (CSI),
4171 we should also consider the possibility of ISO2022 codings. */
4172 if ((VECTORP (Vlatin_extra_code_table
)
4173 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
4174 || (c
== ISO_CODE_SS2
|| c
== ISO_CODE_SS3
)
4175 || (c
== ISO_CODE_CSI
4178 || ((*src
== '0' || *src
== '1' || *src
== '2')
4179 && src
+ 1 < src_end
4180 && src
[1] == ']')))))
4181 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
4182 | CODING_CATEGORY_MASK_ISO_8BIT
);
4185 /* C is a character of ISO2022 in graphic plane right,
4186 or a SJIS's 1-byte character code (i.e. JISX0201),
4187 or the first byte of BIG5's 2-byte code,
4188 or the first byte of UTF-8/16. */
4189 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
4190 | CODING_CATEGORY_MASK_ISO_8BIT
4191 | CODING_CATEGORY_MASK_SJIS
4192 | CODING_CATEGORY_MASK_BIG5
4193 | CODING_CATEGORY_MASK_UTF_8
4194 | CODING_CATEGORY_MASK_UTF_16_BE
4195 | CODING_CATEGORY_MASK_UTF_16_LE
);
4197 /* Or, we may have to consider the possibility of CCL. */
4198 if (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4199 && (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4200 ->spec
.ccl
.valid_codes
)[c
])
4201 try |= CODING_CATEGORY_MASK_CCL
;
4204 utf16_examined_p
= iso2022_examined_p
= 0;
4207 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4209 if (!iso2022_examined_p
4210 && (priorities
[i
] & try & CODING_CATEGORY_MASK_ISO
))
4212 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
);
4213 iso2022_examined_p
= 1;
4215 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_SJIS
)
4216 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4217 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_UTF_8
)
4218 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4219 else if (!utf16_examined_p
4220 && (priorities
[i
] & try &
4221 CODING_CATEGORY_MASK_UTF_16_BE_LE
))
4223 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4224 utf16_examined_p
= 1;
4226 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_BIG5
)
4227 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4228 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_EMACS_MULE
)
4229 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4230 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_CCL
)
4231 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4232 else if (priorities
[i
] & CODING_CATEGORY_MASK_RAW_TEXT
)
4233 mask
|= CODING_CATEGORY_MASK_RAW_TEXT
;
4234 else if (priorities
[i
] & CODING_CATEGORY_MASK_BINARY
)
4235 mask
|= CODING_CATEGORY_MASK_BINARY
;
4236 if (mask
& priorities
[i
])
4237 return priorities
[i
];
4239 return CODING_CATEGORY_MASK_RAW_TEXT
;
4241 if (try & CODING_CATEGORY_MASK_ISO
)
4242 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
);
4243 if (try & CODING_CATEGORY_MASK_SJIS
)
4244 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4245 if (try & CODING_CATEGORY_MASK_BIG5
)
4246 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4247 if (try & CODING_CATEGORY_MASK_UTF_8
)
4248 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4249 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE
)
4250 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4251 if (try & CODING_CATEGORY_MASK_EMACS_MULE
)
4252 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4253 if (try & CODING_CATEGORY_MASK_CCL
)
4254 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4256 return (mask
| CODING_CATEGORY_MASK_RAW_TEXT
| CODING_CATEGORY_MASK_BINARY
);
4259 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
4260 The information of the detected coding system is set in CODING. */
4263 detect_coding (coding
, src
, src_bytes
)
4264 struct coding_system
*coding
;
4265 const unsigned char *src
;
4272 val
= Vcoding_category_list
;
4273 mask
= detect_coding_mask (src
, src_bytes
, coding_priorities
, &skip
,
4274 coding
->src_multibyte
);
4275 coding
->heading_ascii
= skip
;
4279 /* We found a single coding system of the highest priority in MASK. */
4281 while (mask
&& ! (mask
& 1)) mask
>>= 1, idx
++;
4283 idx
= CODING_CATEGORY_IDX_RAW_TEXT
;
4285 val
= SYMBOL_VALUE (XVECTOR (Vcoding_category_table
)->contents
[idx
]);
4287 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4291 tmp
= Fget (val
, Qeol_type
);
4293 val
= XVECTOR (tmp
)->contents
[coding
->eol_type
];
4296 /* Setup this new coding system while preserving some slots. */
4298 int src_multibyte
= coding
->src_multibyte
;
4299 int dst_multibyte
= coding
->dst_multibyte
;
4301 setup_coding_system (val
, coding
);
4302 coding
->src_multibyte
= src_multibyte
;
4303 coding
->dst_multibyte
= dst_multibyte
;
4304 coding
->heading_ascii
= skip
;
4308 /* Detect how end-of-line of a text of length SRC_BYTES pointed by
4309 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
4310 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
4312 How many non-eol characters are at the head is returned as *SKIP. */
4314 #define MAX_EOL_CHECK_COUNT 3
4317 detect_eol_type (source
, src_bytes
, skip
)
4318 unsigned char *source
;
4319 int src_bytes
, *skip
;
4321 unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4323 int total
= 0; /* How many end-of-lines are found so far. */
4324 int eol_type
= CODING_EOL_UNDECIDED
;
4329 while (src
< src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4332 if (c
== '\n' || c
== '\r')
4335 *skip
= src
- 1 - source
;
4338 this_eol_type
= CODING_EOL_LF
;
4339 else if (src
>= src_end
|| *src
!= '\n')
4340 this_eol_type
= CODING_EOL_CR
;
4342 this_eol_type
= CODING_EOL_CRLF
, src
++;
4344 if (eol_type
== CODING_EOL_UNDECIDED
)
4345 /* This is the first end-of-line. */
4346 eol_type
= this_eol_type
;
4347 else if (eol_type
!= this_eol_type
)
4349 /* The found type is different from what found before. */
4350 eol_type
= CODING_EOL_INCONSISTENT
;
4357 *skip
= src_end
- source
;
4361 /* Like detect_eol_type, but detect EOL type in 2-octet
4362 big-endian/little-endian format for coding systems utf-16-be and
4366 detect_eol_type_in_2_octet_form (source
, src_bytes
, skip
, big_endian_p
)
4367 unsigned char *source
;
4368 int src_bytes
, *skip
, big_endian_p
;
4370 unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4371 unsigned int c1
, c2
;
4372 int total
= 0; /* How many end-of-lines are found so far. */
4373 int eol_type
= CODING_EOL_UNDECIDED
;
4384 while ((src
+ 1) < src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4386 c1
= (src
[msb
] << 8) | (src
[lsb
]);
4389 if (c1
== '\n' || c1
== '\r')
4392 *skip
= src
- 2 - source
;
4396 this_eol_type
= CODING_EOL_LF
;
4400 if ((src
+ 1) >= src_end
)
4402 this_eol_type
= CODING_EOL_CR
;
4406 c2
= (src
[msb
] << 8) | (src
[lsb
]);
4408 this_eol_type
= CODING_EOL_CRLF
, src
+= 2;
4410 this_eol_type
= CODING_EOL_CR
;
4414 if (eol_type
== CODING_EOL_UNDECIDED
)
4415 /* This is the first end-of-line. */
4416 eol_type
= this_eol_type
;
4417 else if (eol_type
!= this_eol_type
)
4419 /* The found type is different from what found before. */
4420 eol_type
= CODING_EOL_INCONSISTENT
;
4427 *skip
= src_end
- source
;
4431 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
4432 is encoded. If it detects an appropriate format of end-of-line, it
4433 sets the information in *CODING. */
4436 detect_eol (coding
, src
, src_bytes
)
4437 struct coding_system
*coding
;
4438 const unsigned char *src
;
4445 switch (coding
->category_idx
)
4447 case CODING_CATEGORY_IDX_UTF_16_BE
:
4448 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 1);
4450 case CODING_CATEGORY_IDX_UTF_16_LE
:
4451 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 0);
4454 eol_type
= detect_eol_type (src
, src_bytes
, &skip
);
4458 if (coding
->heading_ascii
> skip
)
4459 coding
->heading_ascii
= skip
;
4461 skip
= coding
->heading_ascii
;
4463 if (eol_type
== CODING_EOL_UNDECIDED
)
4465 if (eol_type
== CODING_EOL_INCONSISTENT
)
4468 /* This code is suppressed until we find a better way to
4469 distinguish raw text file and binary file. */
4471 /* If we have already detected that the coding is raw-text, the
4472 coding should actually be no-conversion. */
4473 if (coding
->type
== coding_type_raw_text
)
4475 setup_coding_system (Qno_conversion
, coding
);
4478 /* Else, let's decode only text code anyway. */
4480 eol_type
= CODING_EOL_LF
;
4483 val
= Fget (coding
->symbol
, Qeol_type
);
4484 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4486 int src_multibyte
= coding
->src_multibyte
;
4487 int dst_multibyte
= coding
->dst_multibyte
;
4488 struct composition_data
*cmp_data
= coding
->cmp_data
;
4490 setup_coding_system (XVECTOR (val
)->contents
[eol_type
], coding
);
4491 coding
->src_multibyte
= src_multibyte
;
4492 coding
->dst_multibyte
= dst_multibyte
;
4493 coding
->heading_ascii
= skip
;
4494 coding
->cmp_data
= cmp_data
;
4498 #define CONVERSION_BUFFER_EXTRA_ROOM 256
4500 #define DECODING_BUFFER_MAG(coding) \
4501 (coding->type == coding_type_iso2022 \
4503 : (coding->type == coding_type_ccl \
4504 ? coding->spec.ccl.decoder.buf_magnification \
4507 /* Return maximum size (bytes) of a buffer enough for decoding
4508 SRC_BYTES of text encoded in CODING. */
4511 decoding_buffer_size (coding
, src_bytes
)
4512 struct coding_system
*coding
;
4515 return (src_bytes
* DECODING_BUFFER_MAG (coding
)
4516 + CONVERSION_BUFFER_EXTRA_ROOM
);
4519 /* Return maximum size (bytes) of a buffer enough for encoding
4520 SRC_BYTES of text to CODING. */
4523 encoding_buffer_size (coding
, src_bytes
)
4524 struct coding_system
*coding
;
4529 if (coding
->type
== coding_type_ccl
)
4531 magnification
= coding
->spec
.ccl
.encoder
.buf_magnification
;
4532 if (coding
->eol_type
== CODING_EOL_CRLF
)
4535 else if (CODING_REQUIRE_ENCODING (coding
))
4540 return (src_bytes
* magnification
+ CONVERSION_BUFFER_EXTRA_ROOM
);
4543 /* Working buffer for code conversion. */
4544 struct conversion_buffer
4546 int size
; /* size of data. */
4547 int on_stack
; /* 1 if allocated by alloca. */
4548 unsigned char *data
;
4551 /* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
4552 #define allocate_conversion_buffer(buf, len) \
4554 if (len < MAX_ALLOCA) \
4556 buf.data = (unsigned char *) alloca (len); \
4561 buf.data = (unsigned char *) xmalloc (len); \
4567 /* Double the allocated memory for *BUF. */
4569 extend_conversion_buffer (buf
)
4570 struct conversion_buffer
*buf
;
4574 unsigned char *save
= buf
->data
;
4575 buf
->data
= (unsigned char *) xmalloc (buf
->size
* 2);
4576 bcopy (save
, buf
->data
, buf
->size
);
4581 buf
->data
= (unsigned char *) xrealloc (buf
->data
, buf
->size
* 2);
4586 /* Free the allocated memory for BUF if it is not on stack. */
4588 free_conversion_buffer (buf
)
4589 struct conversion_buffer
*buf
;
4596 ccl_coding_driver (coding
, source
, destination
, src_bytes
, dst_bytes
, encodep
)
4597 struct coding_system
*coding
;
4598 unsigned char *source
, *destination
;
4599 int src_bytes
, dst_bytes
, encodep
;
4601 struct ccl_program
*ccl
4602 = encodep
? &coding
->spec
.ccl
.encoder
: &coding
->spec
.ccl
.decoder
;
4603 unsigned char *dst
= destination
;
4605 ccl
->suppress_error
= coding
->suppress_error
;
4606 ccl
->last_block
= coding
->mode
& CODING_MODE_LAST_BLOCK
;
4609 /* On encoding, EOL format is converted within ccl_driver. For
4610 that, setup proper information in the structure CCL. */
4611 ccl
->eol_type
= coding
->eol_type
;
4612 if (ccl
->eol_type
==CODING_EOL_UNDECIDED
)
4613 ccl
->eol_type
= CODING_EOL_LF
;
4614 ccl
->cr_consumed
= coding
->spec
.ccl
.cr_carryover
;
4615 ccl
->eight_bit_control
= coding
->dst_multibyte
;
4618 ccl
->eight_bit_control
= 1;
4619 ccl
->multibyte
= coding
->src_multibyte
;
4620 if (coding
->spec
.ccl
.eight_bit_carryover
[0] != 0)
4622 /* Move carryover bytes to DESTINATION. */
4623 unsigned char *p
= coding
->spec
.ccl
.eight_bit_carryover
;
4626 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4628 dst_bytes
-= dst
- destination
;
4631 coding
->produced
= (ccl_driver (ccl
, source
, dst
, src_bytes
, dst_bytes
,
4632 &(coding
->consumed
))
4633 + dst
- destination
);
4637 coding
->produced_char
= coding
->produced
;
4638 coding
->spec
.ccl
.cr_carryover
= ccl
->cr_consumed
;
4640 else if (!ccl
->eight_bit_control
)
4642 /* The produced bytes forms a valid multibyte sequence. */
4643 coding
->produced_char
4644 = multibyte_chars_in_text (destination
, coding
->produced
);
4645 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4649 /* On decoding, the destination should always multibyte. But,
4650 CCL program might have been generated an invalid multibyte
4651 sequence. Here we make such a sequence valid as
4654 = dst_bytes
? dst_bytes
: source
+ coding
->consumed
- destination
;
4656 if ((coding
->consumed
< src_bytes
4657 || !ccl
->last_block
)
4658 && coding
->produced
>= 1
4659 && destination
[coding
->produced
- 1] >= 0x80)
4661 /* We should not convert the tailing 8-bit codes to
4662 multibyte form even if they doesn't form a valid
4663 multibyte sequence. They may form a valid sequence in
4667 if (destination
[coding
->produced
- 1] < 0xA0)
4669 else if (coding
->produced
>= 2)
4671 if (destination
[coding
->produced
- 2] >= 0x80)
4673 if (destination
[coding
->produced
- 2] < 0xA0)
4675 else if (coding
->produced
>= 3
4676 && destination
[coding
->produced
- 3] >= 0x80
4677 && destination
[coding
->produced
- 3] < 0xA0)
4683 BCOPY_SHORT (destination
+ coding
->produced
- carryover
,
4684 coding
->spec
.ccl
.eight_bit_carryover
,
4686 coding
->spec
.ccl
.eight_bit_carryover
[carryover
] = 0;
4687 coding
->produced
-= carryover
;
4690 coding
->produced
= str_as_multibyte (destination
, bytes
,
4692 &(coding
->produced_char
));
4695 switch (ccl
->status
)
4697 case CCL_STAT_SUSPEND_BY_SRC
:
4698 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
4700 case CCL_STAT_SUSPEND_BY_DST
:
4701 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
4704 case CCL_STAT_INVALID_CMD
:
4705 coding
->result
= CODING_FINISH_INTERRUPT
;
4708 coding
->result
= CODING_FINISH_NORMAL
;
4711 return coding
->result
;
4714 /* Decode EOL format of the text at PTR of BYTES length destructively
4715 according to CODING->eol_type. This is called after the CCL
4716 program produced a decoded text at PTR. If we do CRLF->LF
4717 conversion, update CODING->produced and CODING->produced_char. */
4720 decode_eol_post_ccl (coding
, ptr
, bytes
)
4721 struct coding_system
*coding
;
4725 Lisp_Object val
, saved_coding_symbol
;
4726 unsigned char *pend
= ptr
+ bytes
;
4729 /* Remember the current coding system symbol. We set it back when
4730 an inconsistent EOL is found so that `last-coding-system-used' is
4731 set to the coding system that doesn't specify EOL conversion. */
4732 saved_coding_symbol
= coding
->symbol
;
4734 coding
->spec
.ccl
.cr_carryover
= 0;
4735 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
4737 /* Here, to avoid the call of setup_coding_system, we directly
4738 call detect_eol_type. */
4739 coding
->eol_type
= detect_eol_type (ptr
, bytes
, &dummy
);
4740 if (coding
->eol_type
== CODING_EOL_INCONSISTENT
)
4741 coding
->eol_type
= CODING_EOL_LF
;
4742 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4744 val
= Fget (coding
->symbol
, Qeol_type
);
4745 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4746 coding
->symbol
= XVECTOR (val
)->contents
[coding
->eol_type
];
4748 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4751 if (coding
->eol_type
== CODING_EOL_LF
4752 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
4754 /* We have nothing to do. */
4757 else if (coding
->eol_type
== CODING_EOL_CRLF
)
4759 unsigned char *pstart
= ptr
, *p
= ptr
;
4761 if (! (coding
->mode
& CODING_MODE_LAST_BLOCK
)
4762 && *(pend
- 1) == '\r')
4764 /* If the last character is CR, we can't handle it here
4765 because LF will be in the not-yet-decoded source text.
4766 Record that the CR is not yet processed. */
4767 coding
->spec
.ccl
.cr_carryover
= 1;
4769 coding
->produced_char
--;
4776 if (ptr
+ 1 < pend
&& *(ptr
+ 1) == '\n')
4783 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4784 goto undo_eol_conversion
;
4788 else if (*ptr
== '\n'
4789 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4790 goto undo_eol_conversion
;
4795 undo_eol_conversion
:
4796 /* We have faced with inconsistent EOL format at PTR.
4797 Convert all LFs before PTR back to CRLFs. */
4798 for (p
--, ptr
--; p
>= pstart
; p
--)
4801 *ptr
-- = '\n', *ptr
-- = '\r';
4805 /* If carryover is recorded, cancel it because we don't
4806 convert CRLF anymore. */
4807 if (coding
->spec
.ccl
.cr_carryover
)
4809 coding
->spec
.ccl
.cr_carryover
= 0;
4811 coding
->produced_char
++;
4815 coding
->eol_type
= CODING_EOL_LF
;
4816 coding
->symbol
= saved_coding_symbol
;
4820 /* As each two-byte sequence CRLF was converted to LF, (PEND
4821 - P) is the number of deleted characters. */
4822 coding
->produced
-= pend
- p
;
4823 coding
->produced_char
-= pend
- p
;
4826 else /* i.e. coding->eol_type == CODING_EOL_CR */
4828 unsigned char *p
= ptr
;
4830 for (; ptr
< pend
; ptr
++)
4834 else if (*ptr
== '\n'
4835 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4837 for (; p
< ptr
; p
++)
4843 coding
->eol_type
= CODING_EOL_LF
;
4844 coding
->symbol
= saved_coding_symbol
;
4850 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4851 decoding, it may detect coding system and format of end-of-line if
4852 those are not yet decided. The source should be unibyte, the
4853 result is multibyte if CODING->dst_multibyte is nonzero, else
4857 decode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4858 struct coding_system
*coding
;
4859 const unsigned char *source
;
4860 unsigned char *destination
;
4861 int src_bytes
, dst_bytes
;
4865 if (coding
->type
== coding_type_undecided
)
4866 detect_coding (coding
, source
, src_bytes
);
4868 if (coding
->eol_type
== CODING_EOL_UNDECIDED
4869 && coding
->type
!= coding_type_ccl
)
4871 detect_eol (coding
, source
, src_bytes
);
4872 /* We had better recover the original eol format if we
4873 encounter an inconsistent eol format while decoding. */
4874 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4877 coding
->produced
= coding
->produced_char
= 0;
4878 coding
->consumed
= coding
->consumed_char
= 0;
4880 coding
->result
= CODING_FINISH_NORMAL
;
4882 switch (coding
->type
)
4884 case coding_type_sjis
:
4885 decode_coding_sjis_big5 (coding
, source
, destination
,
4886 src_bytes
, dst_bytes
, 1);
4889 case coding_type_iso2022
:
4890 decode_coding_iso2022 (coding
, source
, destination
,
4891 src_bytes
, dst_bytes
);
4894 case coding_type_big5
:
4895 decode_coding_sjis_big5 (coding
, source
, destination
,
4896 src_bytes
, dst_bytes
, 0);
4899 case coding_type_emacs_mule
:
4900 decode_coding_emacs_mule (coding
, source
, destination
,
4901 src_bytes
, dst_bytes
);
4904 case coding_type_ccl
:
4905 if (coding
->spec
.ccl
.cr_carryover
)
4907 /* Put the CR which was not processed by the previous call
4908 of decode_eol_post_ccl in DESTINATION. It will be
4909 decoded together with the following LF by the call to
4910 decode_eol_post_ccl below. */
4911 *destination
= '\r';
4913 coding
->produced_char
++;
4915 extra
= coding
->spec
.ccl
.cr_carryover
;
4917 ccl_coding_driver (coding
, source
, destination
+ extra
,
4918 src_bytes
, dst_bytes
, 0);
4919 if (coding
->eol_type
!= CODING_EOL_LF
)
4921 coding
->produced
+= extra
;
4922 coding
->produced_char
+= extra
;
4923 decode_eol_post_ccl (coding
, destination
, coding
->produced
);
4928 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
4931 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
4932 && coding
->mode
& CODING_MODE_LAST_BLOCK
4933 && coding
->consumed
== src_bytes
)
4934 coding
->result
= CODING_FINISH_NORMAL
;
4936 if (coding
->mode
& CODING_MODE_LAST_BLOCK
4937 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
4939 const unsigned char *src
= source
+ coding
->consumed
;
4940 unsigned char *dst
= destination
+ coding
->produced
;
4942 src_bytes
-= coding
->consumed
;
4944 if (COMPOSING_P (coding
))
4945 DECODE_COMPOSITION_END ('1');
4949 dst
+= CHAR_STRING (c
, dst
);
4950 coding
->produced_char
++;
4952 coding
->consumed
= coding
->consumed_char
= src
- source
;
4953 coding
->produced
= dst
- destination
;
4954 coding
->result
= CODING_FINISH_NORMAL
;
4957 if (!coding
->dst_multibyte
)
4959 coding
->produced
= str_as_unibyte (destination
, coding
->produced
);
4960 coding
->produced_char
= coding
->produced
;
4963 return coding
->result
;
4966 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4967 multibyteness of the source is CODING->src_multibyte, the
4968 multibyteness of the result is always unibyte. */
4971 encode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4972 struct coding_system
*coding
;
4973 const unsigned char *source
;
4974 unsigned char *destination
;
4975 int src_bytes
, dst_bytes
;
4977 coding
->produced
= coding
->produced_char
= 0;
4978 coding
->consumed
= coding
->consumed_char
= 0;
4980 coding
->result
= CODING_FINISH_NORMAL
;
4982 switch (coding
->type
)
4984 case coding_type_sjis
:
4985 encode_coding_sjis_big5 (coding
, source
, destination
,
4986 src_bytes
, dst_bytes
, 1);
4989 case coding_type_iso2022
:
4990 encode_coding_iso2022 (coding
, source
, destination
,
4991 src_bytes
, dst_bytes
);
4994 case coding_type_big5
:
4995 encode_coding_sjis_big5 (coding
, source
, destination
,
4996 src_bytes
, dst_bytes
, 0);
4999 case coding_type_emacs_mule
:
5000 encode_coding_emacs_mule (coding
, source
, destination
,
5001 src_bytes
, dst_bytes
);
5004 case coding_type_ccl
:
5005 ccl_coding_driver (coding
, source
, destination
,
5006 src_bytes
, dst_bytes
, 1);
5010 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
5013 if (coding
->mode
& CODING_MODE_LAST_BLOCK
5014 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
5016 const unsigned char *src
= source
+ coding
->consumed
;
5017 unsigned char *dst
= destination
+ coding
->produced
;
5019 if (coding
->type
== coding_type_iso2022
)
5020 ENCODE_RESET_PLANE_AND_REGISTER
;
5021 if (COMPOSING_P (coding
))
5022 *dst
++ = ISO_CODE_ESC
, *dst
++ = '1';
5023 if (coding
->consumed
< src_bytes
)
5025 int len
= src_bytes
- coding
->consumed
;
5027 BCOPY_SHORT (src
, dst
, len
);
5028 if (coding
->src_multibyte
)
5029 len
= str_as_unibyte (dst
, len
);
5031 coding
->consumed
= src_bytes
;
5033 coding
->produced
= coding
->produced_char
= dst
- destination
;
5034 coding
->result
= CODING_FINISH_NORMAL
;
5037 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
5038 && coding
->consumed
== src_bytes
)
5039 coding
->result
= CODING_FINISH_NORMAL
;
5041 return coding
->result
;
5044 /* Scan text in the region between *BEG and *END (byte positions),
5045 skip characters which we don't have to decode by coding system
5046 CODING at the head and tail, then set *BEG and *END to the region
5047 of the text we actually have to convert. The caller should move
5048 the gap out of the region in advance if the region is from a
5051 If STR is not NULL, *BEG and *END are indices into STR. */
5054 shrink_decoding_region (beg
, end
, coding
, str
)
5056 struct coding_system
*coding
;
5059 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
, c
;
5061 Lisp_Object translation_table
;
5063 if (coding
->type
== coding_type_ccl
5064 || coding
->type
== coding_type_undecided
5065 || coding
->eol_type
!= CODING_EOL_LF
5066 || !NILP (coding
->post_read_conversion
)
5067 || coding
->composing
!= COMPOSITION_DISABLED
)
5069 /* We can't skip any data. */
5072 if (coding
->type
== coding_type_no_conversion
5073 || coding
->type
== coding_type_raw_text
5074 || coding
->type
== coding_type_emacs_mule
)
5076 /* We need no conversion, but don't have to skip any data here.
5077 Decoding routine handles them effectively anyway. */
5081 translation_table
= coding
->translation_table_for_decode
;
5082 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5083 translation_table
= Vstandard_translation_table_for_decode
;
5084 if (CHAR_TABLE_P (translation_table
))
5087 for (i
= 0; i
< 128; i
++)
5088 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5091 /* Some ASCII character should be translated. We give up
5096 if (coding
->heading_ascii
>= 0)
5097 /* Detection routine has already found how much we can skip at the
5099 *beg
+= coding
->heading_ascii
;
5103 begp_orig
= begp
= str
+ *beg
;
5104 endp_orig
= endp
= str
+ *end
;
5108 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5109 endp_orig
= endp
= begp
+ *end
- *beg
;
5112 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5113 || coding
->eol_type
== CODING_EOL_CRLF
);
5115 switch (coding
->type
)
5117 case coding_type_sjis
:
5118 case coding_type_big5
:
5119 /* We can skip all ASCII characters at the head. */
5120 if (coding
->heading_ascii
< 0)
5123 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\r') begp
++;
5125 while (begp
< endp
&& *begp
< 0x80) begp
++;
5127 /* We can skip all ASCII characters at the tail except for the
5128 second byte of SJIS or BIG5 code. */
5130 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\r') endp
--;
5132 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5133 /* Do not consider LF as ascii if preceded by CR, since that
5134 confuses eol decoding. */
5135 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5137 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] >= 0x80)
5141 case coding_type_iso2022
:
5142 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5143 /* We can't skip any data. */
5145 if (coding
->heading_ascii
< 0)
5147 /* We can skip all ASCII characters at the head except for a
5148 few control codes. */
5149 while (begp
< endp
&& (c
= *begp
) < 0x80
5150 && c
!= ISO_CODE_CR
&& c
!= ISO_CODE_SO
5151 && c
!= ISO_CODE_SI
&& c
!= ISO_CODE_ESC
5152 && (!eol_conversion
|| c
!= ISO_CODE_LF
))
5155 switch (coding
->category_idx
)
5157 case CODING_CATEGORY_IDX_ISO_8_1
:
5158 case CODING_CATEGORY_IDX_ISO_8_2
:
5159 /* We can skip all ASCII characters at the tail. */
5161 while (begp
< endp
&& (c
= endp
[-1]) < 0x80 && c
!= '\r') endp
--;
5163 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5164 /* Do not consider LF as ascii if preceded by CR, since that
5165 confuses eol decoding. */
5166 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5170 case CODING_CATEGORY_IDX_ISO_7
:
5171 case CODING_CATEGORY_IDX_ISO_7_TIGHT
:
5173 /* We can skip all characters at the tail except for 8-bit
5174 codes and ESC and the following 2-byte at the tail. */
5175 unsigned char *eight_bit
= NULL
;
5179 && (c
= endp
[-1]) != ISO_CODE_ESC
&& c
!= '\r')
5181 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5186 && (c
= endp
[-1]) != ISO_CODE_ESC
)
5188 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5191 /* Do not consider LF as ascii if preceded by CR, since that
5192 confuses eol decoding. */
5193 if (begp
< endp
&& endp
< endp_orig
5194 && endp
[-1] == '\r' && endp
[0] == '\n')
5196 if (begp
< endp
&& endp
[-1] == ISO_CODE_ESC
)
5198 if (endp
+ 1 < endp_orig
&& end
[0] == '(' && end
[1] == 'B')
5199 /* This is an ASCII designation sequence. We can
5200 surely skip the tail. But, if we have
5201 encountered an 8-bit code, skip only the codes
5203 endp
= eight_bit
? eight_bit
: endp
+ 2;
5205 /* Hmmm, we can't skip the tail. */
5217 *beg
+= begp
- begp_orig
;
5218 *end
+= endp
- endp_orig
;
5222 /* Like shrink_decoding_region but for encoding. */
5225 shrink_encoding_region (beg
, end
, coding
, str
)
5227 struct coding_system
*coding
;
5230 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
;
5232 Lisp_Object translation_table
;
5234 if (coding
->type
== coding_type_ccl
5235 || coding
->eol_type
== CODING_EOL_CRLF
5236 || coding
->eol_type
== CODING_EOL_CR
5237 || (coding
->cmp_data
&& coding
->cmp_data
->used
> 0))
5239 /* We can't skip any data. */
5242 if (coding
->type
== coding_type_no_conversion
5243 || coding
->type
== coding_type_raw_text
5244 || coding
->type
== coding_type_emacs_mule
5245 || coding
->type
== coding_type_undecided
)
5247 /* We need no conversion, but don't have to skip any data here.
5248 Encoding routine handles them effectively anyway. */
5252 translation_table
= coding
->translation_table_for_encode
;
5253 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5254 translation_table
= Vstandard_translation_table_for_encode
;
5255 if (CHAR_TABLE_P (translation_table
))
5258 for (i
= 0; i
< 128; i
++)
5259 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5262 /* Some ASCII character should be translated. We give up
5269 begp_orig
= begp
= str
+ *beg
;
5270 endp_orig
= endp
= str
+ *end
;
5274 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5275 endp_orig
= endp
= begp
+ *end
- *beg
;
5278 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5279 || coding
->eol_type
== CODING_EOL_CRLF
);
5281 /* Here, we don't have to check coding->pre_write_conversion because
5282 the caller is expected to have handled it already. */
5283 switch (coding
->type
)
5285 case coding_type_iso2022
:
5286 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5287 /* We can't skip any data. */
5289 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
5291 unsigned char *bol
= begp
;
5292 while (begp
< endp
&& *begp
< 0x80)
5295 if (begp
[-1] == '\n')
5299 goto label_skip_tail
;
5303 case coding_type_sjis
:
5304 case coding_type_big5
:
5305 /* We can skip all ASCII characters at the head and tail. */
5307 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\n') begp
++;
5309 while (begp
< endp
&& *begp
< 0x80) begp
++;
5312 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\n') endp
--;
5314 while (begp
< endp
&& *(endp
- 1) < 0x80) endp
--;
5321 *beg
+= begp
- begp_orig
;
5322 *end
+= endp
- endp_orig
;
5326 /* As shrinking conversion region requires some overhead, we don't try
5327 shrinking if the length of conversion region is less than this
5329 static int shrink_conversion_region_threshhold
= 1024;
5331 #define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
5333 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
5335 if (encodep) shrink_encoding_region (beg, end, coding, str); \
5336 else shrink_decoding_region (beg, end, coding, str); \
5341 code_convert_region_unwind (arg
)
5344 inhibit_pre_post_conversion
= 0;
5345 Vlast_coding_system_used
= arg
;
5349 /* Store information about all compositions in the range FROM and TO
5350 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
5351 buffer or a string, defaults to the current buffer. */
5354 coding_save_composition (coding
, from
, to
, obj
)
5355 struct coding_system
*coding
;
5362 if (coding
->composing
== COMPOSITION_DISABLED
)
5364 if (!coding
->cmp_data
)
5365 coding_allocate_composition_data (coding
, from
);
5366 if (!find_composition (from
, to
, &start
, &end
, &prop
, obj
)
5370 && (!find_composition (end
, to
, &start
, &end
, &prop
, obj
)
5373 coding
->composing
= COMPOSITION_NO
;
5376 if (COMPOSITION_VALID_P (start
, end
, prop
))
5378 enum composition_method method
= COMPOSITION_METHOD (prop
);
5379 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
5380 >= COMPOSITION_DATA_SIZE
)
5381 coding_allocate_composition_data (coding
, from
);
5382 /* For relative composition, we remember start and end
5383 positions, for the other compositions, we also remember
5385 CODING_ADD_COMPOSITION_START (coding
, start
- from
, method
);
5386 if (method
!= COMPOSITION_RELATIVE
)
5388 /* We must store a*/
5389 Lisp_Object val
, ch
;
5391 val
= COMPOSITION_COMPONENTS (prop
);
5395 ch
= XCAR (val
), val
= XCDR (val
);
5396 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5398 else if (VECTORP (val
) || STRINGP (val
))
5400 int len
= (VECTORP (val
)
5401 ? XVECTOR (val
)->size
: SCHARS (val
));
5403 for (i
= 0; i
< len
; i
++)
5406 ? Faref (val
, make_number (i
))
5407 : XVECTOR (val
)->contents
[i
]);
5408 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5411 else /* INTEGERP (val) */
5412 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (val
));
5414 CODING_ADD_COMPOSITION_END (coding
, end
- from
);
5419 && find_composition (start
, to
, &start
, &end
, &prop
, obj
)
5422 /* Make coding->cmp_data point to the first memory block. */
5423 while (coding
->cmp_data
->prev
)
5424 coding
->cmp_data
= coding
->cmp_data
->prev
;
5425 coding
->cmp_data_start
= 0;
5428 /* Reflect the saved information about compositions to OBJ.
5429 CODING->cmp_data points to a memory block for the information. OBJ
5430 is a buffer or a string, defaults to the current buffer. */
5433 coding_restore_composition (coding
, obj
)
5434 struct coding_system
*coding
;
5437 struct composition_data
*cmp_data
= coding
->cmp_data
;
5442 while (cmp_data
->prev
)
5443 cmp_data
= cmp_data
->prev
;
5449 for (i
= 0; i
< cmp_data
->used
&& cmp_data
->data
[i
] > 0;
5450 i
+= cmp_data
->data
[i
])
5452 int *data
= cmp_data
->data
+ i
;
5453 enum composition_method method
= (enum composition_method
) data
[3];
5454 Lisp_Object components
;
5456 if (data
[0] < 0 || i
+ data
[0] > cmp_data
->used
)
5457 /* Invalid composition data. */
5460 if (method
== COMPOSITION_RELATIVE
)
5464 int len
= data
[0] - 4, j
;
5465 Lisp_Object args
[MAX_COMPOSITION_COMPONENTS
* 2 - 1];
5467 if (method
== COMPOSITION_WITH_RULE_ALTCHARS
5471 /* Invalid composition data. */
5473 for (j
= 0; j
< len
; j
++)
5474 args
[j
] = make_number (data
[4 + j
]);
5475 components
= (method
== COMPOSITION_WITH_ALTCHARS
5476 ? Fstring (len
, args
)
5477 : Fvector (len
, args
));
5479 compose_text (data
[1], data
[2], components
, Qnil
, obj
);
5481 cmp_data
= cmp_data
->next
;
5485 /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
5486 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
5487 coding system CODING, and return the status code of code conversion
5488 (currently, this value has no meaning).
5490 How many characters (and bytes) are converted to how many
5491 characters (and bytes) are recorded in members of the structure
5494 If REPLACE is nonzero, we do various things as if the original text
5495 is deleted and a new text is inserted. See the comments in
5496 replace_range (insdel.c) to know what we are doing.
5498 If REPLACE is zero, it is assumed that the source text is unibyte.
5499 Otherwise, it is assumed that the source text is multibyte. */
5502 code_convert_region (from
, from_byte
, to
, to_byte
, coding
, encodep
, replace
)
5503 int from
, from_byte
, to
, to_byte
, encodep
, replace
;
5504 struct coding_system
*coding
;
5506 int len
= to
- from
, len_byte
= to_byte
- from_byte
;
5507 int nchars_del
= 0, nbytes_del
= 0;
5508 int require
, inserted
, inserted_byte
;
5509 int head_skip
, tail_skip
, total_skip
= 0;
5510 Lisp_Object saved_coding_symbol
;
5512 unsigned char *src
, *dst
;
5513 Lisp_Object deletion
;
5514 int orig_point
= PT
, orig_len
= len
;
5516 int multibyte_p
= !NILP (current_buffer
->enable_multibyte_characters
);
5519 saved_coding_symbol
= coding
->symbol
;
5521 if (from
< PT
&& PT
< to
)
5523 TEMP_SET_PT_BOTH (from
, from_byte
);
5529 int saved_from
= from
;
5530 int saved_inhibit_modification_hooks
;
5532 prepare_to_modify_buffer (from
, to
, &from
);
5533 if (saved_from
!= from
)
5536 from_byte
= CHAR_TO_BYTE (from
), to_byte
= CHAR_TO_BYTE (to
);
5537 len_byte
= to_byte
- from_byte
;
5540 /* The code conversion routine can not preserve text properties
5541 for now. So, we must remove all text properties in the
5542 region. Here, we must suppress all modification hooks. */
5543 saved_inhibit_modification_hooks
= inhibit_modification_hooks
;
5544 inhibit_modification_hooks
= 1;
5545 Fset_text_properties (make_number (from
), make_number (to
), Qnil
, Qnil
);
5546 inhibit_modification_hooks
= saved_inhibit_modification_hooks
;
5549 if (! encodep
&& CODING_REQUIRE_DETECTION (coding
))
5551 /* We must detect encoding of text and eol format. */
5553 if (from
< GPT
&& to
> GPT
)
5554 move_gap_both (from
, from_byte
);
5555 if (coding
->type
== coding_type_undecided
)
5557 detect_coding (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5558 if (coding
->type
== coding_type_undecided
)
5560 /* It seems that the text contains only ASCII, but we
5561 should not leave it undecided because the deeper
5562 decoding routine (decode_coding) tries to detect the
5563 encodings again in vain. */
5564 coding
->type
= coding_type_emacs_mule
;
5565 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
5566 /* As emacs-mule decoder will handle composition, we
5567 need this setting to allocate coding->cmp_data
5569 coding
->composing
= COMPOSITION_NO
;
5572 if (coding
->eol_type
== CODING_EOL_UNDECIDED
5573 && coding
->type
!= coding_type_ccl
)
5575 detect_eol (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5576 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
5577 coding
->eol_type
= CODING_EOL_LF
;
5578 /* We had better recover the original eol format if we
5579 encounter an inconsistent eol format while decoding. */
5580 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
5584 /* Now we convert the text. */
5586 /* For encoding, we must process pre-write-conversion in advance. */
5587 if (! inhibit_pre_post_conversion
5589 && SYMBOLP (coding
->pre_write_conversion
)
5590 && ! NILP (Ffboundp (coding
->pre_write_conversion
)))
5592 /* The function in pre-write-conversion may put a new text in a
5594 struct buffer
*prev
= current_buffer
;
5597 record_unwind_protect (code_convert_region_unwind
,
5598 Vlast_coding_system_used
);
5599 /* We should not call any more pre-write/post-read-conversion
5600 functions while this pre-write-conversion is running. */
5601 inhibit_pre_post_conversion
= 1;
5602 call2 (coding
->pre_write_conversion
,
5603 make_number (from
), make_number (to
));
5604 inhibit_pre_post_conversion
= 0;
5605 /* Discard the unwind protect. */
5608 if (current_buffer
!= prev
)
5611 new = Fcurrent_buffer ();
5612 set_buffer_internal_1 (prev
);
5613 del_range_2 (from
, from_byte
, to
, to_byte
, 0);
5614 TEMP_SET_PT_BOTH (from
, from_byte
);
5615 insert_from_buffer (XBUFFER (new), 1, len
, 0);
5617 if (orig_point
>= to
)
5618 orig_point
+= len
- orig_len
;
5619 else if (orig_point
> from
)
5623 from_byte
= CHAR_TO_BYTE (from
);
5624 to_byte
= CHAR_TO_BYTE (to
);
5625 len_byte
= to_byte
- from_byte
;
5626 TEMP_SET_PT_BOTH (from
, from_byte
);
5632 if (! EQ (current_buffer
->undo_list
, Qt
))
5633 deletion
= make_buffer_string_both (from
, from_byte
, to
, to_byte
, 1);
5636 nchars_del
= to
- from
;
5637 nbytes_del
= to_byte
- from_byte
;
5641 if (coding
->composing
!= COMPOSITION_DISABLED
)
5644 coding_save_composition (coding
, from
, to
, Fcurrent_buffer ());
5646 coding_allocate_composition_data (coding
, from
);
5649 /* Try to skip the heading and tailing ASCIIs. */
5650 if (coding
->type
!= coding_type_ccl
)
5652 int from_byte_orig
= from_byte
, to_byte_orig
= to_byte
;
5654 if (from
< GPT
&& GPT
< to
)
5655 move_gap_both (from
, from_byte
);
5656 SHRINK_CONVERSION_REGION (&from_byte
, &to_byte
, coding
, NULL
, encodep
);
5657 if (from_byte
== to_byte
5658 && (encodep
|| NILP (coding
->post_read_conversion
))
5659 && ! CODING_REQUIRE_FLUSHING (coding
))
5661 coding
->produced
= len_byte
;
5662 coding
->produced_char
= len
;
5664 /* We must record and adjust for this new text now. */
5665 adjust_after_insert (from
, from_byte_orig
, to
, to_byte_orig
, len
);
5669 head_skip
= from_byte
- from_byte_orig
;
5670 tail_skip
= to_byte_orig
- to_byte
;
5671 total_skip
= head_skip
+ tail_skip
;
5674 len
-= total_skip
; len_byte
-= total_skip
;
5677 /* For conversion, we must put the gap before the text in addition to
5678 making the gap larger for efficient decoding. The required gap
5679 size starts from 2000 which is the magic number used in make_gap.
5680 But, after one batch of conversion, it will be incremented if we
5681 find that it is not enough . */
5684 if (GAP_SIZE
< require
)
5685 make_gap (require
- GAP_SIZE
);
5686 move_gap_both (from
, from_byte
);
5688 inserted
= inserted_byte
= 0;
5690 GAP_SIZE
+= len_byte
;
5693 ZV_BYTE
-= len_byte
;
5696 if (GPT
- BEG
< BEG_UNCHANGED
)
5697 BEG_UNCHANGED
= GPT
- BEG
;
5698 if (Z
- GPT
< END_UNCHANGED
)
5699 END_UNCHANGED
= Z
- GPT
;
5701 if (!encodep
&& coding
->src_multibyte
)
5703 /* Decoding routines expects that the source text is unibyte.
5704 We must convert 8-bit characters of multibyte form to
5706 int len_byte_orig
= len_byte
;
5707 len_byte
= str_as_unibyte (GAP_END_ADDR
- len_byte
, len_byte
);
5708 if (len_byte
< len_byte_orig
)
5709 safe_bcopy (GAP_END_ADDR
- len_byte_orig
, GAP_END_ADDR
- len_byte
,
5711 coding
->src_multibyte
= 0;
5718 /* The buffer memory is now:
5719 +--------+converted-text+---------+-------original-text-------+---+
5720 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5721 |<---------------------- GAP ----------------------->| */
5722 src
= GAP_END_ADDR
- len_byte
;
5723 dst
= GPT_ADDR
+ inserted_byte
;
5726 result
= encode_coding (coding
, src
, dst
, len_byte
, 0);
5729 if (coding
->composing
!= COMPOSITION_DISABLED
)
5730 coding
->cmp_data
->char_offset
= from
+ inserted
;
5731 result
= decode_coding (coding
, src
, dst
, len_byte
, 0);
5734 /* The buffer memory is now:
5735 +--------+-------converted-text----+--+------original-text----+---+
5736 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5737 |<---------------------- GAP ----------------------->| */
5739 inserted
+= coding
->produced_char
;
5740 inserted_byte
+= coding
->produced
;
5741 len_byte
-= coding
->consumed
;
5743 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
5745 coding_allocate_composition_data (coding
, from
+ inserted
);
5749 src
+= coding
->consumed
;
5750 dst
+= coding
->produced
;
5752 if (result
== CODING_FINISH_NORMAL
)
5757 if (! encodep
&& result
== CODING_FINISH_INCONSISTENT_EOL
)
5759 unsigned char *pend
= dst
, *p
= pend
- inserted_byte
;
5760 Lisp_Object eol_type
;
5762 /* Encode LFs back to the original eol format (CR or CRLF). */
5763 if (coding
->eol_type
== CODING_EOL_CR
)
5765 while (p
< pend
) if (*p
++ == '\n') p
[-1] = '\r';
5771 while (p
< pend
) if (*p
++ == '\n') count
++;
5772 if (src
- dst
< count
)
5774 /* We don't have sufficient room for encoding LFs
5775 back to CRLF. We must record converted and
5776 not-yet-converted text back to the buffer
5777 content, enlarge the gap, then record them out of
5778 the buffer contents again. */
5779 int add
= len_byte
+ inserted_byte
;
5782 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5783 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5784 make_gap (count
- GAP_SIZE
);
5786 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5787 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5788 /* Don't forget to update SRC, DST, and PEND. */
5789 src
= GAP_END_ADDR
- len_byte
;
5790 dst
= GPT_ADDR
+ inserted_byte
;
5794 inserted_byte
+= count
;
5795 coding
->produced
+= count
;
5796 p
= dst
= pend
+ count
;
5800 if (*p
== '\n') count
--, *--p
= '\r';
5804 /* Suppress eol-format conversion in the further conversion. */
5805 coding
->eol_type
= CODING_EOL_LF
;
5807 /* Set the coding system symbol to that for Unix-like EOL. */
5808 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
5809 if (VECTORP (eol_type
)
5810 && XVECTOR (eol_type
)->size
== 3
5811 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
5812 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
5814 coding
->symbol
= saved_coding_symbol
;
5820 if (coding
->type
!= coding_type_ccl
5821 || coding
->mode
& CODING_MODE_LAST_BLOCK
)
5823 coding
->mode
|= CODING_MODE_LAST_BLOCK
;
5826 if (result
== CODING_FINISH_INSUFFICIENT_SRC
)
5828 /* The source text ends in invalid codes. Let's just
5829 make them valid buffer contents, and finish conversion. */
5832 unsigned char *start
= dst
;
5834 inserted
+= len_byte
;
5838 dst
+= CHAR_STRING (c
, dst
);
5841 inserted_byte
+= dst
- start
;
5845 inserted
+= len_byte
;
5846 inserted_byte
+= len_byte
;
5852 if (result
== CODING_FINISH_INTERRUPT
)
5854 /* The conversion procedure was interrupted by a user. */
5857 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5858 if (coding
->consumed
< 1)
5860 /* It's quite strange to require more memory without
5861 consuming any bytes. Perhaps CCL program bug. */
5866 /* We have just done the first batch of conversion which was
5867 stopped because of insufficient gap. Let's reconsider the
5868 required gap size (i.e. SRT - DST) now.
5870 We have converted ORIG bytes (== coding->consumed) into
5871 NEW bytes (coding->produced). To convert the remaining
5872 LEN bytes, we may need REQUIRE bytes of gap, where:
5873 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5874 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5875 Here, we are sure that NEW >= ORIG. */
5878 if (coding
->produced
<= coding
->consumed
)
5880 /* This happens because of CCL-based coding system with
5886 ratio
= (coding
->produced
- coding
->consumed
) / coding
->consumed
;
5887 require
= len_byte
* ratio
;
5891 if ((src
- dst
) < (require
+ 2000))
5893 /* See the comment above the previous call of make_gap. */
5894 int add
= len_byte
+ inserted_byte
;
5897 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5898 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5899 make_gap (require
+ 2000);
5901 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5902 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5905 if (src
- dst
> 0) *dst
= 0; /* Put an anchor. */
5907 if (encodep
&& coding
->dst_multibyte
)
5909 /* The output is unibyte. We must convert 8-bit characters to
5911 if (inserted_byte
* 2 > GAP_SIZE
)
5913 GAP_SIZE
-= inserted_byte
;
5914 ZV
+= inserted_byte
; Z
+= inserted_byte
;
5915 ZV_BYTE
+= inserted_byte
; Z_BYTE
+= inserted_byte
;
5916 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5917 make_gap (inserted_byte
- GAP_SIZE
);
5918 GAP_SIZE
+= inserted_byte
;
5919 ZV
-= inserted_byte
; Z
-= inserted_byte
;
5920 ZV_BYTE
-= inserted_byte
; Z_BYTE
-= inserted_byte
;
5921 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5923 inserted_byte
= str_to_multibyte (GPT_ADDR
, GAP_SIZE
, inserted_byte
);
5926 /* If we shrank the conversion area, adjust it now. */
5930 safe_bcopy (GAP_END_ADDR
, GPT_ADDR
+ inserted_byte
, tail_skip
);
5931 inserted
+= total_skip
; inserted_byte
+= total_skip
;
5932 GAP_SIZE
+= total_skip
;
5933 GPT
-= head_skip
; GPT_BYTE
-= head_skip
;
5934 ZV
-= total_skip
; ZV_BYTE
-= total_skip
;
5935 Z
-= total_skip
; Z_BYTE
-= total_skip
;
5936 from
-= head_skip
; from_byte
-= head_skip
;
5937 to
+= tail_skip
; to_byte
+= tail_skip
;
5941 if (! EQ (current_buffer
->undo_list
, Qt
))
5942 adjust_after_replace (from
, from_byte
, deletion
, inserted
, inserted_byte
);
5944 adjust_after_replace_noundo (from
, from_byte
, nchars_del
, nbytes_del
,
5945 inserted
, inserted_byte
);
5946 inserted
= Z
- prev_Z
;
5948 if (!encodep
&& coding
->cmp_data
&& coding
->cmp_data
->used
)
5949 coding_restore_composition (coding
, Fcurrent_buffer ());
5950 coding_free_composition_data (coding
);
5952 if (! inhibit_pre_post_conversion
5953 && ! encodep
&& ! NILP (coding
->post_read_conversion
))
5956 Lisp_Object saved_coding_system
;
5959 TEMP_SET_PT_BOTH (from
, from_byte
);
5961 record_unwind_protect (code_convert_region_unwind
,
5962 Vlast_coding_system_used
);
5963 saved_coding_system
= Vlast_coding_system_used
;
5964 Vlast_coding_system_used
= coding
->symbol
;
5965 /* We should not call any more pre-write/post-read-conversion
5966 functions while this post-read-conversion is running. */
5967 inhibit_pre_post_conversion
= 1;
5968 val
= call1 (coding
->post_read_conversion
, make_number (inserted
));
5969 inhibit_pre_post_conversion
= 0;
5970 coding
->symbol
= Vlast_coding_system_used
;
5971 Vlast_coding_system_used
= saved_coding_system
;
5972 /* Discard the unwind protect. */
5975 inserted
+= Z
- prev_Z
;
5978 if (orig_point
>= from
)
5980 if (orig_point
>= from
+ orig_len
)
5981 orig_point
+= inserted
- orig_len
;
5984 TEMP_SET_PT (orig_point
);
5989 signal_after_change (from
, to
- from
, inserted
);
5990 update_compositions (from
, from
+ inserted
, CHECK_BORDER
);
5994 coding
->consumed
= to_byte
- from_byte
;
5995 coding
->consumed_char
= to
- from
;
5996 coding
->produced
= inserted_byte
;
5997 coding
->produced_char
= inserted
;
6004 run_pre_post_conversion_on_str (str
, coding
, encodep
)
6006 struct coding_system
*coding
;
6009 int count
= SPECPDL_INDEX ();
6010 struct gcpro gcpro1
, gcpro2
;
6011 int multibyte
= STRING_MULTIBYTE (str
);
6014 Lisp_Object old_deactivate_mark
;
6016 record_unwind_protect (Fset_buffer
, Fcurrent_buffer ());
6017 record_unwind_protect (code_convert_region_unwind
,
6018 Vlast_coding_system_used
);
6019 /* It is not crucial to specbind this. */
6020 old_deactivate_mark
= Vdeactivate_mark
;
6021 GCPRO2 (str
, old_deactivate_mark
);
6023 buffer
= Fget_buffer_create (build_string (" *code-converting-work*"));
6024 buf
= XBUFFER (buffer
);
6026 delete_all_overlays (buf
);
6027 buf
->directory
= current_buffer
->directory
;
6028 buf
->read_only
= Qnil
;
6029 buf
->filename
= Qnil
;
6030 buf
->undo_list
= Qt
;
6031 eassert (buf
->overlays_before
== NULL
);
6032 eassert (buf
->overlays_after
== NULL
);
6034 set_buffer_internal (buf
);
6035 /* We must insert the contents of STR as is without
6036 unibyte<->multibyte conversion. For that, we adjust the
6037 multibyteness of the working buffer to that of STR. */
6039 buf
->enable_multibyte_characters
= multibyte
? Qt
: Qnil
;
6041 insert_from_string (str
, 0, 0,
6042 SCHARS (str
), SBYTES (str
), 0);
6044 inhibit_pre_post_conversion
= 1;
6046 call2 (coding
->pre_write_conversion
, make_number (BEG
), make_number (Z
));
6049 Vlast_coding_system_used
= coding
->symbol
;
6050 TEMP_SET_PT_BOTH (BEG
, BEG_BYTE
);
6051 call1 (coding
->post_read_conversion
, make_number (Z
- BEG
));
6052 coding
->symbol
= Vlast_coding_system_used
;
6054 inhibit_pre_post_conversion
= 0;
6055 Vdeactivate_mark
= old_deactivate_mark
;
6056 str
= make_buffer_string (BEG
, Z
, 1);
6057 return unbind_to (count
, str
);
6061 decode_coding_string (str
, coding
, nocopy
)
6063 struct coding_system
*coding
;
6067 struct conversion_buffer buf
;
6069 Lisp_Object saved_coding_symbol
;
6071 int require_decoding
;
6072 int shrinked_bytes
= 0;
6074 int consumed
, consumed_char
, produced
, produced_char
;
6077 to_byte
= SBYTES (str
);
6079 saved_coding_symbol
= coding
->symbol
;
6080 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
6081 coding
->dst_multibyte
= 1;
6082 if (CODING_REQUIRE_DETECTION (coding
))
6084 /* See the comments in code_convert_region. */
6085 if (coding
->type
== coding_type_undecided
)
6087 detect_coding (coding
, SDATA (str
), to_byte
);
6088 if (coding
->type
== coding_type_undecided
)
6090 coding
->type
= coding_type_emacs_mule
;
6091 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
6092 /* As emacs-mule decoder will handle composition, we
6093 need this setting to allocate coding->cmp_data
6095 coding
->composing
= COMPOSITION_NO
;
6098 if (coding
->eol_type
== CODING_EOL_UNDECIDED
6099 && coding
->type
!= coding_type_ccl
)
6101 saved_coding_symbol
= coding
->symbol
;
6102 detect_eol (coding
, SDATA (str
), to_byte
);
6103 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
6104 coding
->eol_type
= CODING_EOL_LF
;
6105 /* We had better recover the original eol format if we
6106 encounter an inconsistent eol format while decoding. */
6107 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
6111 if (coding
->type
== coding_type_no_conversion
6112 || coding
->type
== coding_type_raw_text
)
6113 coding
->dst_multibyte
= 0;
6115 require_decoding
= CODING_REQUIRE_DECODING (coding
);
6117 if (STRING_MULTIBYTE (str
))
6119 /* Decoding routines expect the source text to be unibyte. */
6120 str
= Fstring_as_unibyte (str
);
6121 to_byte
= SBYTES (str
);
6123 coding
->src_multibyte
= 0;
6126 /* Try to skip the heading and tailing ASCIIs. */
6127 if (require_decoding
&& coding
->type
!= coding_type_ccl
)
6129 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6131 if (from
== to_byte
)
6132 require_decoding
= 0;
6133 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6136 if (!require_decoding
6137 && !(SYMBOLP (coding
->post_read_conversion
)
6138 && !NILP (Ffboundp (coding
->post_read_conversion
))))
6140 coding
->consumed
= SBYTES (str
);
6141 coding
->consumed_char
= SCHARS (str
);
6142 if (coding
->dst_multibyte
)
6144 str
= Fstring_as_multibyte (str
);
6147 coding
->produced
= SBYTES (str
);
6148 coding
->produced_char
= SCHARS (str
);
6149 return (nocopy
? str
: Fcopy_sequence (str
));
6152 if (coding
->composing
!= COMPOSITION_DISABLED
)
6153 coding_allocate_composition_data (coding
, from
);
6154 len
= decoding_buffer_size (coding
, to_byte
- from
);
6155 allocate_conversion_buffer (buf
, len
);
6157 consumed
= consumed_char
= produced
= produced_char
= 0;
6160 result
= decode_coding (coding
, SDATA (str
) + from
+ consumed
,
6161 buf
.data
+ produced
, to_byte
- from
- consumed
,
6162 buf
.size
- produced
);
6163 consumed
+= coding
->consumed
;
6164 consumed_char
+= coding
->consumed_char
;
6165 produced
+= coding
->produced
;
6166 produced_char
+= coding
->produced_char
;
6167 if (result
== CODING_FINISH_NORMAL
6168 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6169 && coding
->consumed
== 0))
6171 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
6172 coding_allocate_composition_data (coding
, from
+ produced_char
);
6173 else if (result
== CODING_FINISH_INSUFFICIENT_DST
)
6174 extend_conversion_buffer (&buf
);
6175 else if (result
== CODING_FINISH_INCONSISTENT_EOL
)
6177 Lisp_Object eol_type
;
6179 /* Recover the original EOL format. */
6180 if (coding
->eol_type
== CODING_EOL_CR
)
6183 for (p
= buf
.data
; p
< buf
.data
+ produced
; p
++)
6184 if (*p
== '\n') *p
= '\r';
6186 else if (coding
->eol_type
== CODING_EOL_CRLF
)
6189 unsigned char *p0
, *p1
;
6190 for (p0
= buf
.data
, p1
= p0
+ produced
; p0
< p1
; p0
++)
6191 if (*p0
== '\n') num_eol
++;
6192 if (produced
+ num_eol
>= buf
.size
)
6193 extend_conversion_buffer (&buf
);
6194 for (p0
= buf
.data
+ produced
, p1
= p0
+ num_eol
; p0
> buf
.data
;)
6197 if (*p0
== '\n') *--p1
= '\r';
6199 produced
+= num_eol
;
6200 produced_char
+= num_eol
;
6202 /* Suppress eol-format conversion in the further conversion. */
6203 coding
->eol_type
= CODING_EOL_LF
;
6205 /* Set the coding system symbol to that for Unix-like EOL. */
6206 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
6207 if (VECTORP (eol_type
)
6208 && XVECTOR (eol_type
)->size
== 3
6209 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
6210 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
6212 coding
->symbol
= saved_coding_symbol
;
6218 coding
->consumed
= consumed
;
6219 coding
->consumed_char
= consumed_char
;
6220 coding
->produced
= produced
;
6221 coding
->produced_char
= produced_char
;
6223 if (coding
->dst_multibyte
)
6224 newstr
= make_uninit_multibyte_string (produced_char
+ shrinked_bytes
,
6225 produced
+ shrinked_bytes
);
6227 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6229 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6230 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6231 if (shrinked_bytes
> from
)
6232 STRING_COPYIN (newstr
, from
+ produced
,
6233 SDATA (str
) + to_byte
,
6234 shrinked_bytes
- from
);
6235 free_conversion_buffer (&buf
);
6237 coding
->consumed
+= shrinked_bytes
;
6238 coding
->consumed_char
+= shrinked_bytes
;
6239 coding
->produced
+= shrinked_bytes
;
6240 coding
->produced_char
+= shrinked_bytes
;
6242 if (coding
->cmp_data
&& coding
->cmp_data
->used
)
6243 coding_restore_composition (coding
, newstr
);
6244 coding_free_composition_data (coding
);
6246 if (SYMBOLP (coding
->post_read_conversion
)
6247 && !NILP (Ffboundp (coding
->post_read_conversion
)))
6248 newstr
= run_pre_post_conversion_on_str (newstr
, coding
, 0);
6254 encode_coding_string (str
, coding
, nocopy
)
6256 struct coding_system
*coding
;
6260 struct conversion_buffer buf
;
6261 int from
, to
, to_byte
;
6263 int shrinked_bytes
= 0;
6265 int consumed
, consumed_char
, produced
, produced_char
;
6267 if (SYMBOLP (coding
->pre_write_conversion
)
6268 && !NILP (Ffboundp (coding
->pre_write_conversion
)))
6269 str
= run_pre_post_conversion_on_str (str
, coding
, 1);
6273 to_byte
= SBYTES (str
);
6275 /* Encoding routines determine the multibyteness of the source text
6276 by coding->src_multibyte. */
6277 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
6278 coding
->dst_multibyte
= 0;
6279 if (! CODING_REQUIRE_ENCODING (coding
))
6281 coding
->consumed
= SBYTES (str
);
6282 coding
->consumed_char
= SCHARS (str
);
6283 if (STRING_MULTIBYTE (str
))
6285 str
= Fstring_as_unibyte (str
);
6288 coding
->produced
= SBYTES (str
);
6289 coding
->produced_char
= SCHARS (str
);
6290 return (nocopy
? str
: Fcopy_sequence (str
));
6293 if (coding
->composing
!= COMPOSITION_DISABLED
)
6294 coding_save_composition (coding
, from
, to
, str
);
6296 /* Try to skip the heading and tailing ASCIIs. */
6297 if (coding
->type
!= coding_type_ccl
)
6299 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6301 if (from
== to_byte
)
6302 return (nocopy
? str
: Fcopy_sequence (str
));
6303 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6306 len
= encoding_buffer_size (coding
, to_byte
- from
);
6307 allocate_conversion_buffer (buf
, len
);
6309 consumed
= consumed_char
= produced
= produced_char
= 0;
6312 result
= encode_coding (coding
, SDATA (str
) + from
+ consumed
,
6313 buf
.data
+ produced
, to_byte
- from
- consumed
,
6314 buf
.size
- produced
);
6315 consumed
+= coding
->consumed
;
6316 consumed_char
+= coding
->consumed_char
;
6317 produced
+= coding
->produced
;
6318 produced_char
+= coding
->produced_char
;
6319 if (result
== CODING_FINISH_NORMAL
6320 || result
== CODING_FINISH_INTERRUPT
6321 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6322 && coding
->consumed
== 0))
6324 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
6325 extend_conversion_buffer (&buf
);
6328 coding
->consumed
= consumed
;
6329 coding
->consumed_char
= consumed_char
;
6330 coding
->produced
= produced
;
6331 coding
->produced_char
= produced_char
;
6333 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6335 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6336 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6337 if (shrinked_bytes
> from
)
6338 STRING_COPYIN (newstr
, from
+ produced
,
6339 SDATA (str
) + to_byte
,
6340 shrinked_bytes
- from
);
6342 free_conversion_buffer (&buf
);
6343 coding_free_composition_data (coding
);
6350 /*** 8. Emacs Lisp library functions ***/
6352 DEFUN ("coding-system-p", Fcoding_system_p
, Scoding_system_p
, 1, 1, 0,
6353 doc
: /* Return t if OBJECT is nil or a coding-system.
6354 See the documentation of `make-coding-system' for information
6355 about coding-system objects. */)
6363 if (! NILP (Fget (obj
, Qcoding_system_define_form
)))
6365 /* Get coding-spec vector for OBJ. */
6366 obj
= Fget (obj
, Qcoding_system
);
6367 return ((VECTORP (obj
) && XVECTOR (obj
)->size
== 5)
6371 DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system
,
6372 Sread_non_nil_coding_system
, 1, 1, 0,
6373 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT. */)
6380 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6381 Qt
, Qnil
, Qcoding_system_history
, Qnil
, Qnil
);
6383 while (SCHARS (val
) == 0);
6384 return (Fintern (val
, Qnil
));
6387 DEFUN ("read-coding-system", Fread_coding_system
, Sread_coding_system
, 1, 2, 0,
6388 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT.
6389 If the user enters null input, return second argument DEFAULT-CODING-SYSTEM. */)
6390 (prompt
, default_coding_system
)
6391 Lisp_Object prompt
, default_coding_system
;
6394 if (SYMBOLP (default_coding_system
))
6395 default_coding_system
= SYMBOL_NAME (default_coding_system
);
6396 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6397 Qt
, Qnil
, Qcoding_system_history
,
6398 default_coding_system
, Qnil
);
6399 return (SCHARS (val
) == 0 ? Qnil
: Fintern (val
, Qnil
));
6402 DEFUN ("check-coding-system", Fcheck_coding_system
, Scheck_coding_system
,
6404 doc
: /* Check validity of CODING-SYSTEM.
6405 If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.
6406 It is valid if it is nil or a symbol with a non-nil `coding-system' property.
6407 The value of this property should be a vector of length 5. */)
6409 Lisp_Object coding_system
;
6411 Lisp_Object define_form
;
6413 define_form
= Fget (coding_system
, Qcoding_system_define_form
);
6414 if (! NILP (define_form
))
6416 Fput (coding_system
, Qcoding_system_define_form
, Qnil
);
6417 safe_eval (define_form
);
6419 if (!NILP (Fcoding_system_p (coding_system
)))
6420 return coding_system
;
6422 Fsignal (Qcoding_system_error
, Fcons (coding_system
, Qnil
));
6426 detect_coding_system (src
, src_bytes
, highest
, multibytep
)
6427 const unsigned char *src
;
6428 int src_bytes
, highest
;
6431 int coding_mask
, eol_type
;
6432 Lisp_Object val
, tmp
;
6435 coding_mask
= detect_coding_mask (src
, src_bytes
, NULL
, &dummy
, multibytep
);
6436 eol_type
= detect_eol_type (src
, src_bytes
, &dummy
);
6437 if (eol_type
== CODING_EOL_INCONSISTENT
)
6438 eol_type
= CODING_EOL_UNDECIDED
;
6443 if (eol_type
!= CODING_EOL_UNDECIDED
)
6446 val2
= Fget (Qundecided
, Qeol_type
);
6448 val
= XVECTOR (val2
)->contents
[eol_type
];
6450 return (highest
? val
: Fcons (val
, Qnil
));
6453 /* At first, gather possible coding systems in VAL. */
6455 for (tmp
= Vcoding_category_list
; CONSP (tmp
); tmp
= XCDR (tmp
))
6457 Lisp_Object category_val
, category_index
;
6459 category_index
= Fget (XCAR (tmp
), Qcoding_category_index
);
6460 category_val
= Fsymbol_value (XCAR (tmp
));
6461 if (!NILP (category_val
)
6462 && NATNUMP (category_index
)
6463 && (coding_mask
& (1 << XFASTINT (category_index
))))
6465 val
= Fcons (category_val
, val
);
6471 val
= Fnreverse (val
);
6473 /* Then, replace the elements with subsidiary coding systems. */
6474 for (tmp
= val
; CONSP (tmp
); tmp
= XCDR (tmp
))
6476 if (eol_type
!= CODING_EOL_UNDECIDED
6477 && eol_type
!= CODING_EOL_INCONSISTENT
)
6480 eol
= Fget (XCAR (tmp
), Qeol_type
);
6482 XSETCAR (tmp
, XVECTOR (eol
)->contents
[eol_type
]);
6485 return (highest
? XCAR (val
) : val
);
6488 DEFUN ("detect-coding-region", Fdetect_coding_region
, Sdetect_coding_region
,
6490 doc
: /* Detect how the byte sequence in the region is encoded.
6491 Return a list of possible coding systems used on decoding a byte
6492 sequence containing the bytes in the region between START and END when
6493 the coding system `undecided' is specified. The list is ordered by
6494 priority decided in the current language environment.
6496 If only ASCII characters are found, it returns a list of single element
6497 `undecided' or its subsidiary coding system according to a detected
6500 If optional argument HIGHEST is non-nil, return the coding system of
6501 highest priority. */)
6502 (start
, end
, highest
)
6503 Lisp_Object start
, end
, highest
;
6506 int from_byte
, to_byte
;
6507 int include_anchor_byte
= 0;
6509 CHECK_NUMBER_COERCE_MARKER (start
);
6510 CHECK_NUMBER_COERCE_MARKER (end
);
6512 validate_region (&start
, &end
);
6513 from
= XINT (start
), to
= XINT (end
);
6514 from_byte
= CHAR_TO_BYTE (from
);
6515 to_byte
= CHAR_TO_BYTE (to
);
6517 if (from
< GPT
&& to
>= GPT
)
6518 move_gap_both (to
, to_byte
);
6519 /* If we an anchor byte `\0' follows the region, we include it in
6520 the detecting source. Then code detectors can handle the tailing
6521 byte sequence more accurately.
6523 Fix me: This is not a perfect solution. It is better that we
6524 add one more argument, say LAST_BLOCK, to all detect_coding_XXX.
6526 if (to
== Z
|| (to
== GPT
&& GAP_SIZE
> 0))
6527 include_anchor_byte
= 1;
6528 return detect_coding_system (BYTE_POS_ADDR (from_byte
),
6529 to_byte
- from_byte
+ include_anchor_byte
,
6531 !NILP (current_buffer
6532 ->enable_multibyte_characters
));
6535 DEFUN ("detect-coding-string", Fdetect_coding_string
, Sdetect_coding_string
,
6537 doc
: /* Detect how the byte sequence in STRING is encoded.
6538 Return a list of possible coding systems used on decoding a byte
6539 sequence containing the bytes in STRING when the coding system
6540 `undecided' is specified. The list is ordered by priority decided in
6541 the current language environment.
6543 If only ASCII characters are found, it returns a list of single element
6544 `undecided' or its subsidiary coding system according to a detected
6547 If optional argument HIGHEST is non-nil, return the coding system of
6548 highest priority. */)
6550 Lisp_Object string
, highest
;
6552 CHECK_STRING (string
);
6554 return detect_coding_system (SDATA (string
),
6555 /* "+ 1" is to include the anchor byte
6556 `\0'. With this, code detectors can
6557 handle the tailing bytes more
6559 SBYTES (string
) + 1,
6561 STRING_MULTIBYTE (string
));
6564 /* Subroutine for Fsafe_coding_systems_region_internal.
6566 Return a list of coding systems that safely encode the multibyte
6567 text between P and PEND. SAFE_CODINGS, if non-nil, is an alist of
6568 possible coding systems. If it is nil, it means that we have not
6569 yet found any coding systems.
6571 WORK_TABLE a char-table of which element is set to t once the
6572 element is looked up.
6574 If a non-ASCII single byte char is found, set
6575 *single_byte_char_found to 1. */
6578 find_safe_codings (p
, pend
, safe_codings
, work_table
, single_byte_char_found
)
6579 unsigned char *p
, *pend
;
6580 Lisp_Object safe_codings
, work_table
;
6581 int *single_byte_char_found
;
6584 Lisp_Object val
, ch
;
6585 Lisp_Object prev
, tail
;
6587 if (NILP (safe_codings
))
6588 goto done_safe_codings
;
6591 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6593 if (ASCII_BYTE_P (c
))
6594 /* We can ignore ASCII characters here. */
6596 if (SINGLE_BYTE_CHAR_P (c
))
6597 *single_byte_char_found
= 1;
6598 /* Check the safe coding systems for C. */
6599 ch
= make_number (c
);
6600 val
= Faref (work_table
, ch
);
6602 /* This element was already checked. Ignore it. */
6604 /* Remember that we checked this element. */
6605 Faset (work_table
, ch
, Qt
);
6607 for (prev
= tail
= safe_codings
; CONSP (tail
); tail
= XCDR (tail
))
6609 Lisp_Object elt
, translation_table
, hash_table
, accept_latin_extra
;
6613 if (CONSP (XCDR (elt
)))
6615 /* This entry has this format now:
6616 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6617 ACCEPT-LATIN-EXTRA ) */
6619 encodable
= ! NILP (Faref (XCAR (val
), ch
));
6623 translation_table
= XCAR (val
);
6624 hash_table
= XCAR (XCDR (val
));
6625 accept_latin_extra
= XCAR (XCDR (XCDR (val
)));
6630 /* This entry has this format now: ( CODING . SAFE-CHARS) */
6631 encodable
= ! NILP (Faref (XCDR (elt
), ch
));
6634 /* Transform the format to:
6635 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6636 ACCEPT-LATIN-EXTRA ) */
6637 val
= Fget (XCAR (elt
), Qcoding_system
);
6639 = Fplist_get (AREF (val
, 3),
6640 Qtranslation_table_for_encode
);
6641 if (SYMBOLP (translation_table
))
6642 translation_table
= Fget (translation_table
,
6643 Qtranslation_table
);
6645 = (CHAR_TABLE_P (translation_table
)
6646 ? XCHAR_TABLE (translation_table
)->extras
[1]
6649 = ((EQ (AREF (val
, 0), make_number (2))
6650 && VECTORP (AREF (val
, 4)))
6651 ? AREF (AREF (val
, 4), 16)
6653 XSETCAR (tail
, list5 (XCAR (elt
), XCDR (elt
),
6654 translation_table
, hash_table
,
6655 accept_latin_extra
));
6660 && ((CHAR_TABLE_P (translation_table
)
6661 && ! NILP (Faref (translation_table
, ch
)))
6662 || (HASH_TABLE_P (hash_table
)
6663 && ! NILP (Fgethash (ch
, hash_table
, Qnil
)))
6664 || (SINGLE_BYTE_CHAR_P (c
)
6665 && ! NILP (accept_latin_extra
)
6666 && VECTORP (Vlatin_extra_code_table
)
6667 && ! NILP (AREF (Vlatin_extra_code_table
, c
)))))
6673 /* Exclude this coding system from SAFE_CODINGS. */
6674 if (EQ (tail
, safe_codings
))
6676 safe_codings
= XCDR (safe_codings
);
6677 if (NILP (safe_codings
))
6678 goto done_safe_codings
;
6681 XSETCDR (prev
, XCDR (tail
));
6687 /* If the above loop was terminated before P reaches PEND, it means
6688 SAFE_CODINGS was set to nil. If we have not yet found an
6689 non-ASCII single-byte char, check it now. */
6690 if (! *single_byte_char_found
)
6693 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6695 if (! ASCII_BYTE_P (c
)
6696 && SINGLE_BYTE_CHAR_P (c
))
6698 *single_byte_char_found
= 1;
6702 return safe_codings
;
6705 DEFUN ("find-coding-systems-region-internal",
6706 Ffind_coding_systems_region_internal
,
6707 Sfind_coding_systems_region_internal
, 2, 2, 0,
6708 doc
: /* Internal use only. */)
6710 Lisp_Object start
, end
;
6712 Lisp_Object work_table
, safe_codings
;
6713 int non_ascii_p
= 0;
6714 int single_byte_char_found
= 0;
6715 const unsigned char *p1
, *p1end
, *p2
, *p2end
, *p
;
6717 if (STRINGP (start
))
6719 if (!STRING_MULTIBYTE (start
))
6721 p1
= SDATA (start
), p1end
= p1
+ SBYTES (start
);
6723 if (SCHARS (start
) != SBYTES (start
))
6730 CHECK_NUMBER_COERCE_MARKER (start
);
6731 CHECK_NUMBER_COERCE_MARKER (end
);
6732 if (XINT (start
) < BEG
|| XINT (end
) > Z
|| XINT (start
) > XINT (end
))
6733 args_out_of_range (start
, end
);
6734 if (NILP (current_buffer
->enable_multibyte_characters
))
6736 from
= CHAR_TO_BYTE (XINT (start
));
6737 to
= CHAR_TO_BYTE (XINT (end
));
6738 stop
= from
< GPT_BYTE
&& GPT_BYTE
< to
? GPT_BYTE
: to
;
6739 p1
= BYTE_POS_ADDR (from
), p1end
= p1
+ (stop
- from
);
6743 p2
= BYTE_POS_ADDR (stop
), p2end
= p2
+ (to
- stop
);
6744 if (XINT (end
) - XINT (start
) != to
- from
)
6750 /* We are sure that the text contains no multibyte character.
6751 Check if it contains eight-bit-graphic. */
6753 for (p
= p1
; p
< p1end
&& ASCII_BYTE_P (*p
); p
++);
6756 for (p
= p2
; p
< p2end
&& ASCII_BYTE_P (*p
); p
++);
6762 /* The text contains non-ASCII characters. */
6764 work_table
= Fmake_char_table (Qchar_coding_system
, Qnil
);
6765 safe_codings
= Fcopy_sequence (XCDR (Vcoding_system_safe_chars
));
6767 safe_codings
= find_safe_codings (p1
, p1end
, safe_codings
, work_table
,
6768 &single_byte_char_found
);
6770 safe_codings
= find_safe_codings (p2
, p2end
, safe_codings
, work_table
,
6771 &single_byte_char_found
);
6772 if (EQ (safe_codings
, XCDR (Vcoding_system_safe_chars
)))
6776 /* Turn safe_codings to a list of coding systems... */
6779 if (single_byte_char_found
)
6780 /* ... and append these for eight-bit chars. */
6781 val
= Fcons (Qraw_text
,
6782 Fcons (Qemacs_mule
, Fcons (Qno_conversion
, Qnil
)));
6784 /* ... and append generic coding systems. */
6785 val
= Fcopy_sequence (XCAR (Vcoding_system_safe_chars
));
6787 for (; CONSP (safe_codings
); safe_codings
= XCDR (safe_codings
))
6788 val
= Fcons (XCAR (XCAR (safe_codings
)), val
);
6792 return safe_codings
;
6796 /* Search from position POS for such characters that are unencodable
6797 accoding to SAFE_CHARS, and return a list of their positions. P
6798 points where in the memory the character at POS exists. Limit the
6799 search at PEND or when Nth unencodable characters are found.
6801 If SAFE_CHARS is a char table, an element for an unencodable
6804 If SAFE_CHARS is nil, all non-ASCII characters are unencodable.
6806 Otherwise, SAFE_CHARS is t, and only eight-bit-contrl and
6807 eight-bit-graphic characters are unencodable. */
6810 unencodable_char_position (safe_chars
, pos
, p
, pend
, n
)
6811 Lisp_Object safe_chars
;
6813 unsigned char *p
, *pend
;
6816 Lisp_Object pos_list
;
6822 int c
= STRING_CHAR_AND_LENGTH (p
, MAX_MULTIBYTE_LENGTH
, len
);
6825 && (CHAR_TABLE_P (safe_chars
)
6826 ? NILP (CHAR_TABLE_REF (safe_chars
, c
))
6827 : (NILP (safe_chars
) || c
< 256)))
6829 pos_list
= Fcons (make_number (pos
), pos_list
);
6836 return Fnreverse (pos_list
);
6840 DEFUN ("unencodable-char-position", Funencodable_char_position
,
6841 Sunencodable_char_position
, 3, 5, 0,
6843 Return position of first un-encodable character in a region.
6844 START and END specfiy the region and CODING-SYSTEM specifies the
6845 encoding to check. Return nil if CODING-SYSTEM does encode the region.
6847 If optional 4th argument COUNT is non-nil, it specifies at most how
6848 many un-encodable characters to search. In this case, the value is a
6851 If optional 5th argument STRING is non-nil, it is a string to search
6852 for un-encodable characters. In that case, START and END are indexes
6854 (start
, end
, coding_system
, count
, string
)
6855 Lisp_Object start
, end
, coding_system
, count
, string
;
6858 Lisp_Object safe_chars
;
6859 struct coding_system coding
;
6860 Lisp_Object positions
;
6862 unsigned char *p
, *pend
;
6866 validate_region (&start
, &end
);
6867 from
= XINT (start
);
6869 if (NILP (current_buffer
->enable_multibyte_characters
))
6871 p
= CHAR_POS_ADDR (from
);
6875 pend
= CHAR_POS_ADDR (to
);
6879 CHECK_STRING (string
);
6880 CHECK_NATNUM (start
);
6882 from
= XINT (start
);
6885 || to
> SCHARS (string
))
6886 args_out_of_range_3 (string
, start
, end
);
6887 if (! STRING_MULTIBYTE (string
))
6889 p
= SDATA (string
) + string_char_to_byte (string
, from
);
6890 pend
= SDATA (string
) + string_char_to_byte (string
, to
);
6893 setup_coding_system (Fcheck_coding_system (coding_system
), &coding
);
6899 CHECK_NATNUM (count
);
6903 if (coding
.type
== coding_type_no_conversion
6904 || coding
.type
== coding_type_raw_text
)
6907 if (coding
.type
== coding_type_undecided
)
6910 safe_chars
= coding_safe_chars (coding_system
);
6912 if (STRINGP (string
)
6913 || from
>= GPT
|| to
<= GPT
)
6914 positions
= unencodable_char_position (safe_chars
, from
, p
, pend
, n
);
6917 Lisp_Object args
[2];
6919 args
[0] = unencodable_char_position (safe_chars
, from
, p
, GPT_ADDR
, n
);
6920 n
-= XINT (Flength (args
[0]));
6922 positions
= args
[0];
6925 args
[1] = unencodable_char_position (safe_chars
, GPT
, GAP_END_ADDR
,
6927 positions
= Fappend (2, args
);
6931 return (NILP (count
) ? Fcar (positions
) : positions
);
6936 code_convert_region1 (start
, end
, coding_system
, encodep
)
6937 Lisp_Object start
, end
, coding_system
;
6940 struct coding_system coding
;
6943 CHECK_NUMBER_COERCE_MARKER (start
);
6944 CHECK_NUMBER_COERCE_MARKER (end
);
6945 CHECK_SYMBOL (coding_system
);
6947 validate_region (&start
, &end
);
6948 from
= XFASTINT (start
);
6949 to
= XFASTINT (end
);
6951 if (NILP (coding_system
))
6952 return make_number (to
- from
);
6954 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
6955 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
6957 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
6958 coding
.src_multibyte
= coding
.dst_multibyte
6959 = !NILP (current_buffer
->enable_multibyte_characters
);
6960 code_convert_region (from
, CHAR_TO_BYTE (from
), to
, CHAR_TO_BYTE (to
),
6961 &coding
, encodep
, 1);
6962 Vlast_coding_system_used
= coding
.symbol
;
6963 return make_number (coding
.produced_char
);
6966 DEFUN ("decode-coding-region", Fdecode_coding_region
, Sdecode_coding_region
,
6967 3, 3, "r\nzCoding system: ",
6968 doc
: /* Decode the current region from the specified coding system.
6969 When called from a program, takes three arguments:
6970 START, END, and CODING-SYSTEM. START and END are buffer positions.
6971 This function sets `last-coding-system-used' to the precise coding system
6972 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6973 not fully specified.)
6974 It returns the length of the decoded text. */)
6975 (start
, end
, coding_system
)
6976 Lisp_Object start
, end
, coding_system
;
6978 return code_convert_region1 (start
, end
, coding_system
, 0);
6981 DEFUN ("encode-coding-region", Fencode_coding_region
, Sencode_coding_region
,
6982 3, 3, "r\nzCoding system: ",
6983 doc
: /* Encode the current region into the specified coding system.
6984 When called from a program, takes three arguments:
6985 START, END, and CODING-SYSTEM. START and END are buffer positions.
6986 This function sets `last-coding-system-used' to the precise coding system
6987 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6988 not fully specified.)
6989 It returns the length of the encoded text. */)
6990 (start
, end
, coding_system
)
6991 Lisp_Object start
, end
, coding_system
;
6993 return code_convert_region1 (start
, end
, coding_system
, 1);
6997 code_convert_string1 (string
, coding_system
, nocopy
, encodep
)
6998 Lisp_Object string
, coding_system
, nocopy
;
7001 struct coding_system coding
;
7003 CHECK_STRING (string
);
7004 CHECK_SYMBOL (coding_system
);
7006 if (NILP (coding_system
))
7007 return (NILP (nocopy
) ? Fcopy_sequence (string
) : string
);
7009 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
7010 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
7012 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
7014 ? encode_coding_string (string
, &coding
, !NILP (nocopy
))
7015 : decode_coding_string (string
, &coding
, !NILP (nocopy
)));
7016 Vlast_coding_system_used
= coding
.symbol
;
7021 DEFUN ("decode-coding-string", Fdecode_coding_string
, Sdecode_coding_string
,
7023 doc
: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result.
7024 Optional arg NOCOPY non-nil means it is OK to return STRING itself
7025 if the decoding operation is trivial.
7026 This function sets `last-coding-system-used' to the precise coding system
7027 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7028 not fully specified.) */)
7029 (string
, coding_system
, nocopy
)
7030 Lisp_Object string
, coding_system
, nocopy
;
7032 return code_convert_string1 (string
, coding_system
, nocopy
, 0);
7035 DEFUN ("encode-coding-string", Fencode_coding_string
, Sencode_coding_string
,
7037 doc
: /* Encode STRING to CODING-SYSTEM, and return the result.
7038 Optional arg NOCOPY non-nil means it is OK to return STRING itself
7039 if the encoding operation is trivial.
7040 This function sets `last-coding-system-used' to the precise coding system
7041 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
7042 not fully specified.) */)
7043 (string
, coding_system
, nocopy
)
7044 Lisp_Object string
, coding_system
, nocopy
;
7046 return code_convert_string1 (string
, coding_system
, nocopy
, 1);
7049 /* Encode or decode STRING according to CODING_SYSTEM.
7050 Do not set Vlast_coding_system_used.
7052 This function is called only from macros DECODE_FILE and
7053 ENCODE_FILE, thus we ignore character composition. */
7056 code_convert_string_norecord (string
, coding_system
, encodep
)
7057 Lisp_Object string
, coding_system
;
7060 struct coding_system coding
;
7062 CHECK_STRING (string
);
7063 CHECK_SYMBOL (coding_system
);
7065 if (NILP (coding_system
))
7068 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
7069 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
7071 coding
.composing
= COMPOSITION_DISABLED
;
7072 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
7074 ? encode_coding_string (string
, &coding
, 1)
7075 : decode_coding_string (string
, &coding
, 1));
7078 DEFUN ("decode-sjis-char", Fdecode_sjis_char
, Sdecode_sjis_char
, 1, 1, 0,
7079 doc
: /* Decode a Japanese character which has CODE in shift_jis encoding.
7080 Return the corresponding character. */)
7084 unsigned char c1
, c2
, s1
, s2
;
7087 CHECK_NUMBER (code
);
7088 s1
= (XFASTINT (code
)) >> 8, s2
= (XFASTINT (code
)) & 0xFF;
7092 XSETFASTINT (val
, s2
);
7093 else if (s2
>= 0xA0 || s2
<= 0xDF)
7094 XSETFASTINT (val
, MAKE_CHAR (charset_katakana_jisx0201
, s2
, 0));
7096 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
7100 if ((s1
< 0x80 || (s1
> 0x9F && s1
< 0xE0) || s1
> 0xEF)
7101 || (s2
< 0x40 || s2
== 0x7F || s2
> 0xFC))
7102 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
7103 DECODE_SJIS (s1
, s2
, c1
, c2
);
7104 XSETFASTINT (val
, MAKE_CHAR (charset_jisx0208
, c1
, c2
));
7109 DEFUN ("encode-sjis-char", Fencode_sjis_char
, Sencode_sjis_char
, 1, 1, 0,
7110 doc
: /* Encode a Japanese character CHAR to shift_jis encoding.
7111 Return the corresponding code in SJIS. */)
7115 int charset
, c1
, c2
, s1
, s2
;
7119 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7120 if (charset
== CHARSET_ASCII
)
7124 else if (charset
== charset_jisx0208
7125 && c1
> 0x20 && c1
< 0x7F && c2
> 0x20 && c2
< 0x7F)
7127 ENCODE_SJIS (c1
, c2
, s1
, s2
);
7128 XSETFASTINT (val
, (s1
<< 8) | s2
);
7130 else if (charset
== charset_katakana_jisx0201
7131 && c1
> 0x20 && c2
< 0xE0)
7133 XSETFASTINT (val
, c1
| 0x80);
7136 error ("Can't encode to shift_jis: %d", XFASTINT (ch
));
7140 DEFUN ("decode-big5-char", Fdecode_big5_char
, Sdecode_big5_char
, 1, 1, 0,
7141 doc
: /* Decode a Big5 character which has CODE in BIG5 coding system.
7142 Return the corresponding character. */)
7147 unsigned char b1
, b2
, c1
, c2
;
7150 CHECK_NUMBER (code
);
7151 b1
= (XFASTINT (code
)) >> 8, b2
= (XFASTINT (code
)) & 0xFF;
7155 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7160 if ((b1
< 0xA1 || b1
> 0xFE)
7161 || (b2
< 0x40 || (b2
> 0x7E && b2
< 0xA1) || b2
> 0xFE))
7162 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7163 DECODE_BIG5 (b1
, b2
, charset
, c1
, c2
);
7164 XSETFASTINT (val
, MAKE_CHAR (charset
, c1
, c2
));
7169 DEFUN ("encode-big5-char", Fencode_big5_char
, Sencode_big5_char
, 1, 1, 0,
7170 doc
: /* Encode the Big5 character CHAR to BIG5 coding system.
7171 Return the corresponding character code in Big5. */)
7175 int charset
, c1
, c2
, b1
, b2
;
7179 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7180 if (charset
== CHARSET_ASCII
)
7184 else if ((charset
== charset_big5_1
7185 && (XFASTINT (ch
) >= 0x250a1 && XFASTINT (ch
) <= 0x271ec))
7186 || (charset
== charset_big5_2
7187 && XFASTINT (ch
) >= 0x290a1 && XFASTINT (ch
) <= 0x2bdb2))
7189 ENCODE_BIG5 (charset
, c1
, c2
, b1
, b2
);
7190 XSETFASTINT (val
, (b1
<< 8) | b2
);
7193 error ("Can't encode to Big5: %d", XFASTINT (ch
));
7197 DEFUN ("set-terminal-coding-system-internal", Fset_terminal_coding_system_internal
,
7198 Sset_terminal_coding_system_internal
, 1, 1, 0,
7199 doc
: /* Internal use only. */)
7201 Lisp_Object coding_system
;
7203 struct coding_system
*terminal_coding
= FRAME_TERMINAL_CODING (SELECTED_FRAME ());
7204 CHECK_SYMBOL (coding_system
);
7205 setup_coding_system (Fcheck_coding_system (coding_system
), terminal_coding
);
7206 /* We had better not send unsafe characters to terminal. */
7207 terminal_coding
->mode
|= CODING_MODE_INHIBIT_UNENCODABLE_CHAR
;
7208 /* Character composition should be disabled. */
7209 terminal_coding
->composing
= COMPOSITION_DISABLED
;
7210 /* Error notification should be suppressed. */
7211 terminal_coding
->suppress_error
= 1;
7212 terminal_coding
->src_multibyte
= 1;
7213 terminal_coding
->dst_multibyte
= 0;
7217 DEFUN ("set-safe-terminal-coding-system-internal", Fset_safe_terminal_coding_system_internal
,
7218 Sset_safe_terminal_coding_system_internal
, 1, 1, 0,
7219 doc
: /* Internal use only. */)
7221 Lisp_Object coding_system
;
7223 CHECK_SYMBOL (coding_system
);
7224 setup_coding_system (Fcheck_coding_system (coding_system
),
7225 &safe_terminal_coding
);
7226 /* Character composition should be disabled. */
7227 safe_terminal_coding
.composing
= COMPOSITION_DISABLED
;
7228 /* Error notification should be suppressed. */
7229 safe_terminal_coding
.suppress_error
= 1;
7230 safe_terminal_coding
.src_multibyte
= 1;
7231 safe_terminal_coding
.dst_multibyte
= 0;
7235 DEFUN ("terminal-coding-system", Fterminal_coding_system
,
7236 Sterminal_coding_system
, 0, 0, 0,
7237 doc
: /* Return coding system specified for terminal output. */)
7240 return FRAME_TERMINAL_CODING (SELECTED_FRAME ())->symbol
;
7243 DEFUN ("set-keyboard-coding-system-internal", Fset_keyboard_coding_system_internal
,
7244 Sset_keyboard_coding_system_internal
, 1, 1, 0,
7245 doc
: /* Internal use only. */)
7247 Lisp_Object coding_system
;
7249 CHECK_SYMBOL (coding_system
);
7250 setup_coding_system (Fcheck_coding_system (coding_system
),
7251 FRAME_KEYBOARD_CODING (SELECTED_FRAME ()));
7252 /* Character composition should be disabled. */
7253 FRAME_KEYBOARD_CODING (SELECTED_FRAME ())->composing
= COMPOSITION_DISABLED
;
7257 DEFUN ("keyboard-coding-system", Fkeyboard_coding_system
,
7258 Skeyboard_coding_system
, 0, 0, 0,
7259 doc
: /* Return coding system specified for decoding keyboard input. */)
7262 return FRAME_KEYBOARD_CODING (SELECTED_FRAME ())->symbol
;
7266 DEFUN ("find-operation-coding-system", Ffind_operation_coding_system
,
7267 Sfind_operation_coding_system
, 1, MANY
, 0,
7268 doc
: /* Choose a coding system for an operation based on the target name.
7269 The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).
7270 DECODING-SYSTEM is the coding system to use for decoding
7271 \(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system
7272 for encoding (in case OPERATION does encoding).
7274 The first argument OPERATION specifies an I/O primitive:
7275 For file I/O, `insert-file-contents' or `write-region'.
7276 For process I/O, `call-process', `call-process-region', or `start-process'.
7277 For network I/O, `open-network-stream'.
7279 The remaining arguments should be the same arguments that were passed
7280 to the primitive. Depending on which primitive, one of those arguments
7281 is selected as the TARGET. For example, if OPERATION does file I/O,
7282 whichever argument specifies the file name is TARGET.
7284 TARGET has a meaning which depends on OPERATION:
7285 For file I/O, TARGET is a file name.
7286 For process I/O, TARGET is a process name.
7287 For network I/O, TARGET is a service name or a port number
7289 This function looks up what specified for TARGET in,
7290 `file-coding-system-alist', `process-coding-system-alist',
7291 or `network-coding-system-alist' depending on OPERATION.
7292 They may specify a coding system, a cons of coding systems,
7293 or a function symbol to call.
7294 In the last case, we call the function with one argument,
7295 which is a list of all the arguments given to this function.
7297 usage: (find-operation-coding-system OPERATION ARGUMENTS ...) */)
7302 Lisp_Object operation
, target_idx
, target
, val
;
7303 register Lisp_Object chain
;
7306 error ("Too few arguments");
7307 operation
= args
[0];
7308 if (!SYMBOLP (operation
)
7309 || !INTEGERP (target_idx
= Fget (operation
, Qtarget_idx
)))
7310 error ("Invalid first argument");
7311 if (nargs
< 1 + XINT (target_idx
))
7312 error ("Too few arguments for operation: %s",
7313 SDATA (SYMBOL_NAME (operation
)));
7314 /* For write-region, if the 6th argument (i.e. VISIT, the 5th
7315 argument to write-region) is string, it must be treated as a
7316 target file name. */
7317 if (EQ (operation
, Qwrite_region
)
7319 && STRINGP (args
[5]))
7320 target_idx
= make_number (4);
7321 target
= args
[XINT (target_idx
) + 1];
7322 if (!(STRINGP (target
)
7323 || (EQ (operation
, Qopen_network_stream
) && INTEGERP (target
))))
7324 error ("Invalid argument %d", XINT (target_idx
) + 1);
7326 chain
= ((EQ (operation
, Qinsert_file_contents
)
7327 || EQ (operation
, Qwrite_region
))
7328 ? Vfile_coding_system_alist
7329 : (EQ (operation
, Qopen_network_stream
)
7330 ? Vnetwork_coding_system_alist
7331 : Vprocess_coding_system_alist
));
7335 for (; CONSP (chain
); chain
= XCDR (chain
))
7341 && ((STRINGP (target
)
7342 && STRINGP (XCAR (elt
))
7343 && fast_string_match (XCAR (elt
), target
) >= 0)
7344 || (INTEGERP (target
) && EQ (target
, XCAR (elt
)))))
7347 /* Here, if VAL is both a valid coding system and a valid
7348 function symbol, we return VAL as a coding system. */
7351 if (! SYMBOLP (val
))
7353 if (! NILP (Fcoding_system_p (val
)))
7354 return Fcons (val
, val
);
7355 if (! NILP (Ffboundp (val
)))
7357 val
= call1 (val
, Flist (nargs
, args
));
7360 if (SYMBOLP (val
) && ! NILP (Fcoding_system_p (val
)))
7361 return Fcons (val
, val
);
7369 DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal
,
7370 Supdate_coding_systems_internal
, 0, 0, 0,
7371 doc
: /* Update internal database for ISO2022 and CCL based coding systems.
7372 When values of any coding categories are changed, you must
7373 call this function. */)
7378 for (i
= CODING_CATEGORY_IDX_EMACS_MULE
; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7382 val
= SYMBOL_VALUE (XVECTOR (Vcoding_category_table
)->contents
[i
]);
7385 if (! coding_system_table
[i
])
7386 coding_system_table
[i
] = ((struct coding_system
*)
7387 xmalloc (sizeof (struct coding_system
)));
7388 setup_coding_system (val
, coding_system_table
[i
]);
7390 else if (coding_system_table
[i
])
7392 xfree (coding_system_table
[i
]);
7393 coding_system_table
[i
] = NULL
;
7400 DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal
,
7401 Sset_coding_priority_internal
, 0, 0, 0,
7402 doc
: /* Update internal database for the current value of `coding-category-list'.
7403 This function is internal use only. */)
7409 val
= Vcoding_category_list
;
7411 while (CONSP (val
) && i
< CODING_CATEGORY_IDX_MAX
)
7413 if (! SYMBOLP (XCAR (val
)))
7415 idx
= XFASTINT (Fget (XCAR (val
), Qcoding_category_index
));
7416 if (idx
>= CODING_CATEGORY_IDX_MAX
)
7418 coding_priorities
[i
++] = (1 << idx
);
7421 /* If coding-category-list is valid and contains all coding
7422 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
7423 the following code saves Emacs from crashing. */
7424 while (i
< CODING_CATEGORY_IDX_MAX
)
7425 coding_priorities
[i
++] = CODING_CATEGORY_MASK_RAW_TEXT
;
7430 DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal
,
7431 Sdefine_coding_system_internal
, 1, 1, 0,
7432 doc
: /* Register CODING-SYSTEM as a base coding system.
7433 This function is internal use only. */)
7435 Lisp_Object coding_system
;
7437 Lisp_Object safe_chars
, slot
;
7439 if (NILP (Fcheck_coding_system (coding_system
)))
7440 Fsignal (Qcoding_system_error
, Fcons (coding_system
, Qnil
));
7441 safe_chars
= coding_safe_chars (coding_system
);
7442 if (! EQ (safe_chars
, Qt
) && ! CHAR_TABLE_P (safe_chars
))
7443 error ("No valid safe-chars property for %s",
7444 SDATA (SYMBOL_NAME (coding_system
)));
7445 if (EQ (safe_chars
, Qt
))
7447 if (NILP (Fmemq (coding_system
, XCAR (Vcoding_system_safe_chars
))))
7448 XSETCAR (Vcoding_system_safe_chars
,
7449 Fcons (coding_system
, XCAR (Vcoding_system_safe_chars
)));
7453 slot
= Fassq (coding_system
, XCDR (Vcoding_system_safe_chars
));
7455 XSETCDR (Vcoding_system_safe_chars
,
7456 nconc2 (XCDR (Vcoding_system_safe_chars
),
7457 Fcons (Fcons (coding_system
, safe_chars
), Qnil
)));
7459 XSETCDR (slot
, safe_chars
);
7467 /*** 9. Post-amble ***/
7474 /* Emacs' internal format specific initialize routine. */
7475 for (i
= 0; i
<= 0x20; i
++)
7476 emacs_code_class
[i
] = EMACS_control_code
;
7477 emacs_code_class
[0x0A] = EMACS_linefeed_code
;
7478 emacs_code_class
[0x0D] = EMACS_carriage_return_code
;
7479 for (i
= 0x21 ; i
< 0x7F; i
++)
7480 emacs_code_class
[i
] = EMACS_ascii_code
;
7481 emacs_code_class
[0x7F] = EMACS_control_code
;
7482 for (i
= 0x80; i
< 0xFF; i
++)
7483 emacs_code_class
[i
] = EMACS_invalid_code
;
7484 emacs_code_class
[LEADING_CODE_PRIVATE_11
] = EMACS_leading_code_3
;
7485 emacs_code_class
[LEADING_CODE_PRIVATE_12
] = EMACS_leading_code_3
;
7486 emacs_code_class
[LEADING_CODE_PRIVATE_21
] = EMACS_leading_code_4
;
7487 emacs_code_class
[LEADING_CODE_PRIVATE_22
] = EMACS_leading_code_4
;
7489 /* ISO2022 specific initialize routine. */
7490 for (i
= 0; i
< 0x20; i
++)
7491 iso_code_class
[i
] = ISO_control_0
;
7492 for (i
= 0x21; i
< 0x7F; i
++)
7493 iso_code_class
[i
] = ISO_graphic_plane_0
;
7494 for (i
= 0x80; i
< 0xA0; i
++)
7495 iso_code_class
[i
] = ISO_control_1
;
7496 for (i
= 0xA1; i
< 0xFF; i
++)
7497 iso_code_class
[i
] = ISO_graphic_plane_1
;
7498 iso_code_class
[0x20] = iso_code_class
[0x7F] = ISO_0x20_or_0x7F
;
7499 iso_code_class
[0xA0] = iso_code_class
[0xFF] = ISO_0xA0_or_0xFF
;
7500 iso_code_class
[ISO_CODE_CR
] = ISO_carriage_return
;
7501 iso_code_class
[ISO_CODE_SO
] = ISO_shift_out
;
7502 iso_code_class
[ISO_CODE_SI
] = ISO_shift_in
;
7503 iso_code_class
[ISO_CODE_SS2_7
] = ISO_single_shift_2_7
;
7504 iso_code_class
[ISO_CODE_ESC
] = ISO_escape
;
7505 iso_code_class
[ISO_CODE_SS2
] = ISO_single_shift_2
;
7506 iso_code_class
[ISO_CODE_SS3
] = ISO_single_shift_3
;
7507 iso_code_class
[ISO_CODE_CSI
] = ISO_control_sequence_introducer
;
7509 setup_coding_system (Qnil
, &safe_terminal_coding
);
7510 setup_coding_system (Qnil
, &default_buffer_file_coding
);
7512 bzero (coding_system_table
, sizeof coding_system_table
);
7514 bzero (ascii_skip_code
, sizeof ascii_skip_code
);
7515 for (i
= 0; i
< 128; i
++)
7516 ascii_skip_code
[i
] = 1;
7518 #if defined (MSDOS) || defined (WINDOWSNT)
7519 system_eol_type
= CODING_EOL_CRLF
;
7521 system_eol_type
= CODING_EOL_LF
;
7524 inhibit_pre_post_conversion
= 0;
7532 Qtarget_idx
= intern ("target-idx");
7533 staticpro (&Qtarget_idx
);
7535 Qcoding_system_history
= intern ("coding-system-history");
7536 staticpro (&Qcoding_system_history
);
7537 Fset (Qcoding_system_history
, Qnil
);
7539 /* Target FILENAME is the first argument. */
7540 Fput (Qinsert_file_contents
, Qtarget_idx
, make_number (0));
7541 /* Target FILENAME is the third argument. */
7542 Fput (Qwrite_region
, Qtarget_idx
, make_number (2));
7544 Qcall_process
= intern ("call-process");
7545 staticpro (&Qcall_process
);
7546 /* Target PROGRAM is the first argument. */
7547 Fput (Qcall_process
, Qtarget_idx
, make_number (0));
7549 Qcall_process_region
= intern ("call-process-region");
7550 staticpro (&Qcall_process_region
);
7551 /* Target PROGRAM is the third argument. */
7552 Fput (Qcall_process_region
, Qtarget_idx
, make_number (2));
7554 Qstart_process
= intern ("start-process");
7555 staticpro (&Qstart_process
);
7556 /* Target PROGRAM is the third argument. */
7557 Fput (Qstart_process
, Qtarget_idx
, make_number (2));
7559 Qopen_network_stream
= intern ("open-network-stream");
7560 staticpro (&Qopen_network_stream
);
7561 /* Target SERVICE is the fourth argument. */
7562 Fput (Qopen_network_stream
, Qtarget_idx
, make_number (3));
7564 Qcoding_system
= intern ("coding-system");
7565 staticpro (&Qcoding_system
);
7567 Qeol_type
= intern ("eol-type");
7568 staticpro (&Qeol_type
);
7570 Qbuffer_file_coding_system
= intern ("buffer-file-coding-system");
7571 staticpro (&Qbuffer_file_coding_system
);
7573 Qpost_read_conversion
= intern ("post-read-conversion");
7574 staticpro (&Qpost_read_conversion
);
7576 Qpre_write_conversion
= intern ("pre-write-conversion");
7577 staticpro (&Qpre_write_conversion
);
7579 Qno_conversion
= intern ("no-conversion");
7580 staticpro (&Qno_conversion
);
7582 Qundecided
= intern ("undecided");
7583 staticpro (&Qundecided
);
7585 Qcoding_system_p
= intern ("coding-system-p");
7586 staticpro (&Qcoding_system_p
);
7588 Qcoding_system_error
= intern ("coding-system-error");
7589 staticpro (&Qcoding_system_error
);
7591 Fput (Qcoding_system_error
, Qerror_conditions
,
7592 Fcons (Qcoding_system_error
, Fcons (Qerror
, Qnil
)));
7593 Fput (Qcoding_system_error
, Qerror_message
,
7594 build_string ("Invalid coding system"));
7596 Qcoding_category
= intern ("coding-category");
7597 staticpro (&Qcoding_category
);
7598 Qcoding_category_index
= intern ("coding-category-index");
7599 staticpro (&Qcoding_category_index
);
7601 Vcoding_category_table
7602 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX
), Qnil
);
7603 staticpro (&Vcoding_category_table
);
7606 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7608 XVECTOR (Vcoding_category_table
)->contents
[i
]
7609 = intern (coding_category_name
[i
]);
7610 Fput (XVECTOR (Vcoding_category_table
)->contents
[i
],
7611 Qcoding_category_index
, make_number (i
));
7615 Vcoding_system_safe_chars
= Fcons (Qnil
, Qnil
);
7616 staticpro (&Vcoding_system_safe_chars
);
7618 Qtranslation_table
= intern ("translation-table");
7619 staticpro (&Qtranslation_table
);
7620 Fput (Qtranslation_table
, Qchar_table_extra_slots
, make_number (2));
7622 Qtranslation_table_id
= intern ("translation-table-id");
7623 staticpro (&Qtranslation_table_id
);
7625 Qtranslation_table_for_decode
= intern ("translation-table-for-decode");
7626 staticpro (&Qtranslation_table_for_decode
);
7628 Qtranslation_table_for_encode
= intern ("translation-table-for-encode");
7629 staticpro (&Qtranslation_table_for_encode
);
7631 Qsafe_chars
= intern ("safe-chars");
7632 staticpro (&Qsafe_chars
);
7634 Qchar_coding_system
= intern ("char-coding-system");
7635 staticpro (&Qchar_coding_system
);
7637 /* Intern this now in case it isn't already done.
7638 Setting this variable twice is harmless.
7639 But don't staticpro it here--that is done in alloc.c. */
7640 Qchar_table_extra_slots
= intern ("char-table-extra-slots");
7641 Fput (Qsafe_chars
, Qchar_table_extra_slots
, make_number (0));
7642 Fput (Qchar_coding_system
, Qchar_table_extra_slots
, make_number (0));
7644 Qvalid_codes
= intern ("valid-codes");
7645 staticpro (&Qvalid_codes
);
7647 Qemacs_mule
= intern ("emacs-mule");
7648 staticpro (&Qemacs_mule
);
7650 Qraw_text
= intern ("raw-text");
7651 staticpro (&Qraw_text
);
7653 Qutf_8
= intern ("utf-8");
7654 staticpro (&Qutf_8
);
7656 Qcoding_system_define_form
= intern ("coding-system-define-form");
7657 staticpro (&Qcoding_system_define_form
);
7659 defsubr (&Scoding_system_p
);
7660 defsubr (&Sread_coding_system
);
7661 defsubr (&Sread_non_nil_coding_system
);
7662 defsubr (&Scheck_coding_system
);
7663 defsubr (&Sdetect_coding_region
);
7664 defsubr (&Sdetect_coding_string
);
7665 defsubr (&Sfind_coding_systems_region_internal
);
7666 defsubr (&Sunencodable_char_position
);
7667 defsubr (&Sdecode_coding_region
);
7668 defsubr (&Sencode_coding_region
);
7669 defsubr (&Sdecode_coding_string
);
7670 defsubr (&Sencode_coding_string
);
7671 defsubr (&Sdecode_sjis_char
);
7672 defsubr (&Sencode_sjis_char
);
7673 defsubr (&Sdecode_big5_char
);
7674 defsubr (&Sencode_big5_char
);
7675 defsubr (&Sset_terminal_coding_system_internal
);
7676 defsubr (&Sset_safe_terminal_coding_system_internal
);
7677 defsubr (&Sterminal_coding_system
);
7678 defsubr (&Sset_keyboard_coding_system_internal
);
7679 defsubr (&Skeyboard_coding_system
);
7680 defsubr (&Sfind_operation_coding_system
);
7681 defsubr (&Supdate_coding_systems_internal
);
7682 defsubr (&Sset_coding_priority_internal
);
7683 defsubr (&Sdefine_coding_system_internal
);
7685 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list
,
7686 doc
: /* List of coding systems.
7688 Do not alter the value of this variable manually. This variable should be
7689 updated by the functions `make-coding-system' and
7690 `define-coding-system-alias'. */);
7691 Vcoding_system_list
= Qnil
;
7693 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist
,
7694 doc
: /* Alist of coding system names.
7695 Each element is one element list of coding system name.
7696 This variable is given to `completing-read' as TABLE argument.
7698 Do not alter the value of this variable manually. This variable should be
7699 updated by the functions `make-coding-system' and
7700 `define-coding-system-alias'. */);
7701 Vcoding_system_alist
= Qnil
;
7703 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list
,
7704 doc
: /* List of coding-categories (symbols) ordered by priority.
7706 On detecting a coding system, Emacs tries code detection algorithms
7707 associated with each coding-category one by one in this order. When
7708 one algorithm agrees with a byte sequence of source text, the coding
7709 system bound to the corresponding coding-category is selected. */);
7713 Vcoding_category_list
= Qnil
;
7714 for (i
= CODING_CATEGORY_IDX_MAX
- 1; i
>= 0; i
--)
7715 Vcoding_category_list
7716 = Fcons (XVECTOR (Vcoding_category_table
)->contents
[i
],
7717 Vcoding_category_list
);
7720 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read
,
7721 doc
: /* Specify the coding system for read operations.
7722 It is useful to bind this variable with `let', but do not set it globally.
7723 If the value is a coding system, it is used for decoding on read operation.
7724 If not, an appropriate element is used from one of the coding system alists:
7725 There are three such tables, `file-coding-system-alist',
7726 `process-coding-system-alist', and `network-coding-system-alist'. */);
7727 Vcoding_system_for_read
= Qnil
;
7729 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write
,
7730 doc
: /* Specify the coding system for write operations.
7731 Programs bind this variable with `let', but you should not set it globally.
7732 If the value is a coding system, it is used for encoding of output,
7733 when writing it to a file and when sending it to a file or subprocess.
7735 If this does not specify a coding system, an appropriate element
7736 is used from one of the coding system alists:
7737 There are three such tables, `file-coding-system-alist',
7738 `process-coding-system-alist', and `network-coding-system-alist'.
7739 For output to files, if the above procedure does not specify a coding system,
7740 the value of `buffer-file-coding-system' is used. */);
7741 Vcoding_system_for_write
= Qnil
;
7743 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used
,
7744 doc
: /* Coding system used in the latest file or process I/O.
7745 Also set by `encode-coding-region', `decode-coding-region',
7746 `encode-coding-string' and `decode-coding-string'. */);
7747 Vlast_coding_system_used
= Qnil
;
7749 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion
,
7750 doc
: /* *Non-nil means always inhibit code conversion of end-of-line format.
7751 See info node `Coding Systems' and info node `Text and Binary' concerning
7752 such conversion. */);
7753 inhibit_eol_conversion
= 0;
7755 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system
,
7756 doc
: /* Non-nil means process buffer inherits coding system of process output.
7757 Bind it to t if the process output is to be treated as if it were a file
7758 read from some filesystem. */);
7759 inherit_process_coding_system
= 0;
7761 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist
,
7762 doc
: /* Alist to decide a coding system to use for a file I/O operation.
7763 The format is ((PATTERN . VAL) ...),
7764 where PATTERN is a regular expression matching a file name,
7765 VAL is a coding system, a cons of coding systems, or a function symbol.
7766 If VAL is a coding system, it is used for both decoding and encoding
7768 If VAL is a cons of coding systems, the car part is used for decoding,
7769 and the cdr part is used for encoding.
7770 If VAL is a function symbol, the function must return a coding system
7771 or a cons of coding systems which are used as above. The function gets
7772 the arguments with which `find-operation-coding-system' was called.
7774 See also the function `find-operation-coding-system'
7775 and the variable `auto-coding-alist'. */);
7776 Vfile_coding_system_alist
= Qnil
;
7778 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist
,
7779 doc
: /* Alist to decide a coding system to use for a process I/O operation.
7780 The format is ((PATTERN . VAL) ...),
7781 where PATTERN is a regular expression matching a program name,
7782 VAL is a coding system, a cons of coding systems, or a function symbol.
7783 If VAL is a coding system, it is used for both decoding what received
7784 from the program and encoding what sent to the program.
7785 If VAL is a cons of coding systems, the car part is used for decoding,
7786 and the cdr part is used for encoding.
7787 If VAL is a function symbol, the function must return a coding system
7788 or a cons of coding systems which are used as above.
7790 See also the function `find-operation-coding-system'. */);
7791 Vprocess_coding_system_alist
= Qnil
;
7793 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist
,
7794 doc
: /* Alist to decide a coding system to use for a network I/O operation.
7795 The format is ((PATTERN . VAL) ...),
7796 where PATTERN is a regular expression matching a network service name
7797 or is a port number to connect to,
7798 VAL is a coding system, a cons of coding systems, or a function symbol.
7799 If VAL is a coding system, it is used for both decoding what received
7800 from the network stream and encoding what sent to the network stream.
7801 If VAL is a cons of coding systems, the car part is used for decoding,
7802 and the cdr part is used for encoding.
7803 If VAL is a function symbol, the function must return a coding system
7804 or a cons of coding systems which are used as above.
7806 See also the function `find-operation-coding-system'. */);
7807 Vnetwork_coding_system_alist
= Qnil
;
7809 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system
,
7810 doc
: /* Coding system to use with system messages.
7811 Also used for decoding keyboard input on X Window system. */);
7812 Vlocale_coding_system
= Qnil
;
7814 /* The eol mnemonics are reset in startup.el system-dependently. */
7815 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix
,
7816 doc
: /* *String displayed in mode line for UNIX-like (LF) end-of-line format. */);
7817 eol_mnemonic_unix
= build_string (":");
7819 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos
,
7820 doc
: /* *String displayed in mode line for DOS-like (CRLF) end-of-line format. */);
7821 eol_mnemonic_dos
= build_string ("\\");
7823 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac
,
7824 doc
: /* *String displayed in mode line for MAC-like (CR) end-of-line format. */);
7825 eol_mnemonic_mac
= build_string ("/");
7827 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided
,
7828 doc
: /* *String displayed in mode line when end-of-line format is not yet determined. */);
7829 eol_mnemonic_undecided
= build_string (":");
7831 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation
,
7832 doc
: /* *Non-nil enables character translation while encoding and decoding. */);
7833 Venable_character_translation
= Qt
;
7835 DEFVAR_LISP ("standard-translation-table-for-decode",
7836 &Vstandard_translation_table_for_decode
,
7837 doc
: /* Table for translating characters while decoding. */);
7838 Vstandard_translation_table_for_decode
= Qnil
;
7840 DEFVAR_LISP ("standard-translation-table-for-encode",
7841 &Vstandard_translation_table_for_encode
,
7842 doc
: /* Table for translating characters while encoding. */);
7843 Vstandard_translation_table_for_encode
= Qnil
;
7845 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist
,
7846 doc
: /* Alist of charsets vs revision numbers.
7847 While encoding, if a charset (car part of an element) is found,
7848 designate it with the escape sequence identifying revision (cdr part of the element). */);
7849 Vcharset_revision_alist
= Qnil
;
7851 DEFVAR_LISP ("default-process-coding-system",
7852 &Vdefault_process_coding_system
,
7853 doc
: /* Cons of coding systems used for process I/O by default.
7854 The car part is used for decoding a process output,
7855 the cdr part is used for encoding a text to be sent to a process. */);
7856 Vdefault_process_coding_system
= Qnil
;
7858 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table
,
7859 doc
: /* Table of extra Latin codes in the range 128..159 (inclusive).
7860 This is a vector of length 256.
7861 If Nth element is non-nil, the existence of code N in a file
7862 \(or output of subprocess) doesn't prevent it to be detected as
7863 a coding system of ISO 2022 variant which has a flag
7864 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file
7865 or reading output of a subprocess.
7866 Only 128th through 159th elements has a meaning. */);
7867 Vlatin_extra_code_table
= Fmake_vector (make_number (256), Qnil
);
7869 DEFVAR_LISP ("select-safe-coding-system-function",
7870 &Vselect_safe_coding_system_function
,
7871 doc
: /* Function to call to select safe coding system for encoding a text.
7873 If set, this function is called to force a user to select a proper
7874 coding system which can encode the text in the case that a default
7875 coding system used in each operation can't encode the text.
7877 The default value is `select-safe-coding-system' (which see). */);
7878 Vselect_safe_coding_system_function
= Qnil
;
7880 DEFVAR_BOOL ("coding-system-require-warning",
7881 &coding_system_require_warning
,
7882 doc
: /* Internal use only.
7883 If non-nil, on writing a file, `select-safe-coding-system-function' is
7884 called even if `coding-system-for-write' is non-nil. The command
7885 `universal-coding-system-argument' binds this variable to t temporarily. */);
7886 coding_system_require_warning
= 0;
7889 DEFVAR_BOOL ("inhibit-iso-escape-detection",
7890 &inhibit_iso_escape_detection
,
7891 doc
: /* If non-nil, Emacs ignores ISO2022's escape sequence on code detection.
7893 By default, on reading a file, Emacs tries to detect how the text is
7894 encoded. This code detection is sensitive to escape sequences. If
7895 the sequence is valid as ISO2022, the code is determined as one of
7896 the ISO2022 encodings, and the file is decoded by the corresponding
7897 coding system (e.g. `iso-2022-7bit').
7899 However, there may be a case that you want to read escape sequences in
7900 a file as is. In such a case, you can set this variable to non-nil.
7901 Then, as the code detection ignores any escape sequences, no file is
7902 detected as encoded in some ISO2022 encoding. The result is that all
7903 escape sequences become visible in a buffer.
7905 The default value is nil, and it is strongly recommended not to change
7906 it. That is because many Emacs Lisp source files that contain
7907 non-ASCII characters are encoded by the coding system `iso-2022-7bit'
7908 in Emacs's distribution, and they won't be decoded correctly on
7909 reading if you suppress escape sequence detection.
7911 The other way to read escape sequences in a file without decoding is
7912 to explicitly specify some coding system that doesn't use ISO2022's
7913 escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */);
7914 inhibit_iso_escape_detection
= 0;
7916 DEFVAR_LISP ("translation-table-for-input", &Vtranslation_table_for_input
,
7917 doc
: /* Char table for translating self-inserting characters.
7918 This is applied to the result of input methods, not their input. See also
7919 `keyboard-translate-table'. */);
7920 Vtranslation_table_for_input
= Qnil
;
7924 emacs_strerror (error_number
)
7929 synchronize_system_messages_locale ();
7930 str
= strerror (error_number
);
7932 if (! NILP (Vlocale_coding_system
))
7934 Lisp_Object dec
= code_convert_string_norecord (build_string (str
),
7935 Vlocale_coding_system
,
7937 str
= (char *) SDATA (dec
);
7945 /* arch-tag: 3a3a2b01-5ff6-4071-9afe-f5b808d9229d
7946 (do not change this comment) */