1 /* Coding system handler (conversion, detection, and etc).
2 Copyright (C) 1995, 1997, 1998, 2002 Electrotechnical Laboratory, JAPAN.
3 Licensed to the Free Software Foundation.
4 Copyright (C) 2001,2002 Free Software Foundation, Inc.
6 This file is part of GNU Emacs.
8 GNU Emacs is free software; you can redistribute it and/or modify
9 it under the terms of the GNU General Public License as published by
10 the Free Software Foundation; either version 2, or (at your option)
13 GNU Emacs is distributed in the hope that it will be useful,
14 but WITHOUT ANY WARRANTY; without even the implied warranty of
15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 GNU General Public License for more details.
18 You should have received a copy of the GNU General Public License
19 along with GNU Emacs; see the file COPYING. If not, write to
20 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
21 Boston, MA 02111-1307, USA. */
23 /*** TABLE OF CONTENTS ***
27 2. Emacs' internal format (emacs-mule) handlers
29 4. Shift-JIS and BIG5 handlers
31 6. End-of-line handlers
32 7. C library functions
33 8. Emacs Lisp library functions
38 /*** 0. General comments ***/
41 /*** GENERAL NOTE on CODING SYSTEMS ***
43 A coding system is an encoding mechanism for one or more character
44 sets. Here's a list of coding systems which Emacs can handle. When
45 we say "decode", it means converting some other coding system to
46 Emacs' internal format (emacs-mule), and when we say "encode",
47 it means converting the coding system emacs-mule to some other
50 0. Emacs' internal format (emacs-mule)
52 Emacs itself holds a multi-lingual character in buffers and strings
53 in a special format. Details are described in section 2.
57 The most famous coding system for multiple character sets. X's
58 Compound Text, various EUCs (Extended Unix Code), and coding
59 systems used in Internet communication such as ISO-2022-JP are
60 all variants of ISO2022. Details are described in section 3.
62 2. SJIS (or Shift-JIS or MS-Kanji-Code)
64 A coding system to encode character sets: ASCII, JISX0201, and
65 JISX0208. Widely used for PC's in Japan. Details are described in
70 A coding system to encode the character sets ASCII and Big5. Widely
71 used for Chinese (mainly in Taiwan and Hong Kong). Details are
72 described in section 4. In this file, when we write "BIG5"
73 (all uppercase), we mean the coding system, and when we write
74 "Big5" (capitalized), we mean the character set.
78 A coding system for text containing random 8-bit code. Emacs does
79 no code conversion on such text except for end-of-line format.
83 If a user wants to read/write text encoded in a coding system not
84 listed above, he can supply a decoder and an encoder for it as CCL
85 (Code Conversion Language) programs. Emacs executes the CCL program
86 while reading/writing.
88 Emacs represents a coding system by a Lisp symbol that has a property
89 `coding-system'. But, before actually using the coding system, the
90 information about it is set in a structure of type `struct
91 coding_system' for rapid processing. See section 6 for more details.
95 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
97 How end-of-line of text is encoded depends on the operating system.
98 For instance, Unix's format is just one byte of `line-feed' code,
99 whereas DOS's format is two-byte sequence of `carriage-return' and
100 `line-feed' codes. MacOS's format is usually one byte of
103 Since text character encoding and end-of-line encoding are
104 independent, any coding system described above can have any
105 end-of-line format. So Emacs has information about end-of-line
106 format in each coding-system. See section 6 for more details.
110 /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
112 These functions check if a text between SRC and SRC_END is encoded
113 in the coding system category XXX. Each returns an integer value in
114 which appropriate flag bits for the category XXX are set. The flag
115 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
116 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
117 of the range 0x80..0x9F are in multibyte form. */
120 detect_coding_emacs_mule (src
, src_end
, multibytep
)
121 unsigned char *src
, *src_end
;
128 /*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
130 These functions decode SRC_BYTES length of unibyte text at SOURCE
131 encoded in CODING to Emacs' internal format. The resulting
132 multibyte text goes to a place pointed to by DESTINATION, the length
133 of which should not exceed DST_BYTES.
135 These functions set the information about original and decoded texts
136 in the members `produced', `produced_char', `consumed', and
137 `consumed_char' of the structure *CODING. They also set the member
138 `result' to one of CODING_FINISH_XXX indicating how the decoding
141 DST_BYTES zero means that the source area and destination area are
142 overlapped, which means that we can produce a decoded text until it
143 reaches the head of the not-yet-decoded source text.
145 Below is a template for these functions. */
148 decode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
149 struct coding_system
*coding
;
150 unsigned char *source
, *destination
;
151 int src_bytes
, dst_bytes
;
157 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
159 These functions encode SRC_BYTES length text at SOURCE from Emacs'
160 internal multibyte format to CODING. The resulting unibyte text
161 goes to a place pointed to by DESTINATION, the length of which
162 should not exceed DST_BYTES.
164 These functions set the information about original and encoded texts
165 in the members `produced', `produced_char', `consumed', and
166 `consumed_char' of the structure *CODING. They also set the member
167 `result' to one of CODING_FINISH_XXX indicating how the encoding
170 DST_BYTES zero means that the source area and destination area are
171 overlapped, which means that we can produce encoded text until it
172 reaches at the head of the not-yet-encoded source text.
174 Below is a template for these functions. */
177 encode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
178 struct coding_system
*coding
;
179 unsigned char *source
, *destination
;
180 int src_bytes
, dst_bytes
;
186 /*** COMMONLY USED MACROS ***/
188 /* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
189 get one, two, and three bytes from the source text respectively.
190 If there are not enough bytes in the source, they jump to
191 `label_end_of_loop'. The caller should set variables `coding',
192 `src' and `src_end' to appropriate pointer in advance. These
193 macros are called from decoding routines `decode_coding_XXX', thus
194 it is assumed that the source text is unibyte. */
196 #define ONE_MORE_BYTE(c1) \
198 if (src >= src_end) \
200 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
201 goto label_end_of_loop; \
206 #define TWO_MORE_BYTES(c1, c2) \
208 if (src + 1 >= src_end) \
210 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
211 goto label_end_of_loop; \
218 /* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
219 form if MULTIBYTEP is nonzero. */
221 #define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep) \
223 if (src >= src_end) \
225 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
226 goto label_end_of_loop; \
229 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
230 c1 = *src++ - 0x20; \
233 /* Set C to the next character at the source text pointed by `src'.
234 If there are not enough characters in the source, jump to
235 `label_end_of_loop'. The caller should set variables `coding'
236 `src', `src_end', and `translation_table' to appropriate pointers
237 in advance. This macro is used in encoding routines
238 `encode_coding_XXX', thus it assumes that the source text is in
239 multibyte form except for 8-bit characters. 8-bit characters are
240 in multibyte form if coding->src_multibyte is nonzero, else they
241 are represented by a single byte. */
243 #define ONE_MORE_CHAR(c) \
245 int len = src_end - src; \
249 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
250 goto label_end_of_loop; \
252 if (coding->src_multibyte \
253 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
254 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
256 c = *src, bytes = 1; \
257 if (!NILP (translation_table)) \
258 c = translate_char (translation_table, c, -1, 0, 0); \
263 /* Produce a multibyte form of character C to `dst'. Jump to
264 `label_end_of_loop' if there's not enough space at `dst'.
266 If we are now in the middle of a composition sequence, the decoded
267 character may be ALTCHAR (for the current composition). In that
268 case, the character goes to coding->cmp_data->data instead of
271 This macro is used in decoding routines. */
273 #define EMIT_CHAR(c) \
275 if (! COMPOSING_P (coding) \
276 || coding->composing == COMPOSITION_RELATIVE \
277 || coding->composing == COMPOSITION_WITH_RULE) \
279 int bytes = CHAR_BYTES (c); \
280 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
282 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
283 goto label_end_of_loop; \
285 dst += CHAR_STRING (c, dst); \
286 coding->produced_char++; \
289 if (COMPOSING_P (coding) \
290 && coding->composing != COMPOSITION_RELATIVE) \
292 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
293 coding->composition_rule_follows \
294 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
299 #define EMIT_ONE_BYTE(c) \
301 if (dst >= (dst_bytes ? dst_end : src)) \
303 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
304 goto label_end_of_loop; \
309 #define EMIT_TWO_BYTES(c1, c2) \
311 if (dst + 2 > (dst_bytes ? dst_end : src)) \
313 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
314 goto label_end_of_loop; \
316 *dst++ = c1, *dst++ = c2; \
319 #define EMIT_BYTES(from, to) \
321 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
323 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
324 goto label_end_of_loop; \
331 /*** 1. Preamble ***/
344 #include "composite.h"
349 #else /* not emacs */
353 #endif /* not emacs */
355 Lisp_Object Qcoding_system
, Qeol_type
;
356 Lisp_Object Qbuffer_file_coding_system
;
357 Lisp_Object Qpost_read_conversion
, Qpre_write_conversion
;
358 Lisp_Object Qno_conversion
, Qundecided
;
359 Lisp_Object Qcoding_system_history
;
360 Lisp_Object Qsafe_chars
;
361 Lisp_Object Qvalid_codes
;
363 extern Lisp_Object Qinsert_file_contents
, Qwrite_region
;
364 Lisp_Object Qcall_process
, Qcall_process_region
, Qprocess_argument
;
365 Lisp_Object Qstart_process
, Qopen_network_stream
;
366 Lisp_Object Qtarget_idx
;
368 Lisp_Object Vselect_safe_coding_system_function
;
370 int coding_system_require_warning
;
372 /* Mnemonic string for each format of end-of-line. */
373 Lisp_Object eol_mnemonic_unix
, eol_mnemonic_dos
, eol_mnemonic_mac
;
374 /* Mnemonic string to indicate format of end-of-line is not yet
376 Lisp_Object eol_mnemonic_undecided
;
378 /* Format of end-of-line decided by system. This is CODING_EOL_LF on
379 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
384 /* Information about which coding system is safe for which chars.
385 The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
387 GENERIC-LIST is a list of generic coding systems which can encode
390 NON-GENERIC-ALIST is an alist of non generic coding systems vs the
391 corresponding char table that contains safe chars. */
392 Lisp_Object Vcoding_system_safe_chars
;
394 Lisp_Object Vcoding_system_list
, Vcoding_system_alist
;
396 Lisp_Object Qcoding_system_p
, Qcoding_system_error
;
398 /* Coding system emacs-mule and raw-text are for converting only
399 end-of-line format. */
400 Lisp_Object Qemacs_mule
, Qraw_text
;
402 /* Coding-systems are handed between Emacs Lisp programs and C internal
403 routines by the following three variables. */
404 /* Coding-system for reading files and receiving data from process. */
405 Lisp_Object Vcoding_system_for_read
;
406 /* Coding-system for writing files and sending data to process. */
407 Lisp_Object Vcoding_system_for_write
;
408 /* Coding-system actually used in the latest I/O. */
409 Lisp_Object Vlast_coding_system_used
;
411 /* A vector of length 256 which contains information about special
412 Latin codes (especially for dealing with Microsoft codes). */
413 Lisp_Object Vlatin_extra_code_table
;
415 /* Flag to inhibit code conversion of end-of-line format. */
416 int inhibit_eol_conversion
;
418 /* Flag to inhibit ISO2022 escape sequence detection. */
419 int inhibit_iso_escape_detection
;
421 /* Flag to make buffer-file-coding-system inherit from process-coding. */
422 int inherit_process_coding_system
;
424 /* Coding system to be used to encode text for terminal display. */
425 struct coding_system terminal_coding
;
427 /* Coding system to be used to encode text for terminal display when
428 terminal coding system is nil. */
429 struct coding_system safe_terminal_coding
;
431 /* Coding system of what is sent from terminal keyboard. */
432 struct coding_system keyboard_coding
;
434 /* Default coding system to be used to write a file. */
435 struct coding_system default_buffer_file_coding
;
437 Lisp_Object Vfile_coding_system_alist
;
438 Lisp_Object Vprocess_coding_system_alist
;
439 Lisp_Object Vnetwork_coding_system_alist
;
441 Lisp_Object Vlocale_coding_system
;
445 Lisp_Object Qcoding_category
, Qcoding_category_index
;
447 /* List of symbols `coding-category-xxx' ordered by priority. */
448 Lisp_Object Vcoding_category_list
;
450 /* Table of coding categories (Lisp symbols). */
451 Lisp_Object Vcoding_category_table
;
453 /* Table of names of symbol for each coding-category. */
454 char *coding_category_name
[CODING_CATEGORY_IDX_MAX
] = {
455 "coding-category-emacs-mule",
456 "coding-category-sjis",
457 "coding-category-iso-7",
458 "coding-category-iso-7-tight",
459 "coding-category-iso-8-1",
460 "coding-category-iso-8-2",
461 "coding-category-iso-7-else",
462 "coding-category-iso-8-else",
463 "coding-category-ccl",
464 "coding-category-big5",
465 "coding-category-utf-8",
466 "coding-category-utf-16-be",
467 "coding-category-utf-16-le",
468 "coding-category-raw-text",
469 "coding-category-binary"
472 /* Table of pointers to coding systems corresponding to each coding
474 struct coding_system
*coding_system_table
[CODING_CATEGORY_IDX_MAX
];
476 /* Table of coding category masks. Nth element is a mask for a coding
477 category of which priority is Nth. */
479 int coding_priorities
[CODING_CATEGORY_IDX_MAX
];
481 /* Flag to tell if we look up translation table on character code
483 Lisp_Object Venable_character_translation
;
484 /* Standard translation table to look up on decoding (reading). */
485 Lisp_Object Vstandard_translation_table_for_decode
;
486 /* Standard translation table to look up on encoding (writing). */
487 Lisp_Object Vstandard_translation_table_for_encode
;
489 Lisp_Object Qtranslation_table
;
490 Lisp_Object Qtranslation_table_id
;
491 Lisp_Object Qtranslation_table_for_decode
;
492 Lisp_Object Qtranslation_table_for_encode
;
494 /* Alist of charsets vs revision number. */
495 Lisp_Object Vcharset_revision_alist
;
497 /* Default coding systems used for process I/O. */
498 Lisp_Object Vdefault_process_coding_system
;
500 /* Char table for translating Quail and self-inserting input. */
501 Lisp_Object Vtranslation_table_for_input
;
503 /* Global flag to tell that we can't call post-read-conversion and
504 pre-write-conversion functions. Usually the value is zero, but it
505 is set to 1 temporarily while such functions are running. This is
506 to avoid infinite recursive call. */
507 static int inhibit_pre_post_conversion
;
509 /* Char-table containing safe coding systems of each character. */
510 Lisp_Object Vchar_coding_system_table
;
511 Lisp_Object Qchar_coding_system
;
513 /* Return `safe-chars' property of CODING_SYSTEM (symbol). Don't check
517 coding_safe_chars (coding_system
)
518 Lisp_Object coding_system
;
520 Lisp_Object coding_spec
, plist
, safe_chars
;
522 coding_spec
= Fget (coding_system
, Qcoding_system
);
523 plist
= XVECTOR (coding_spec
)->contents
[3];
524 safe_chars
= Fplist_get (XVECTOR (coding_spec
)->contents
[3], Qsafe_chars
);
525 return (CHAR_TABLE_P (safe_chars
) ? safe_chars
: Qt
);
528 #define CODING_SAFE_CHAR_P(safe_chars, c) \
529 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
532 /*** 2. Emacs internal format (emacs-mule) handlers ***/
534 /* Emacs' internal format for representation of multiple character
535 sets is a kind of multi-byte encoding, i.e. characters are
536 represented by variable-length sequences of one-byte codes.
538 ASCII characters and control characters (e.g. `tab', `newline') are
539 represented by one-byte sequences which are their ASCII codes, in
540 the range 0x00 through 0x7F.
542 8-bit characters of the range 0x80..0x9F are represented by
543 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
546 8-bit characters of the range 0xA0..0xFF are represented by
547 one-byte sequences which are their 8-bit code.
549 The other characters are represented by a sequence of `base
550 leading-code', optional `extended leading-code', and one or two
551 `position-code's. The length of the sequence is determined by the
552 base leading-code. Leading-code takes the range 0x81 through 0x9D,
553 whereas extended leading-code and position-code take the range 0xA0
554 through 0xFF. See `charset.h' for more details about leading-code
557 --- CODE RANGE of Emacs' internal format ---
561 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
562 eight-bit-graphic 0xA0..0xBF
563 ELSE 0x81..0x9D + [0xA0..0xFF]+
564 ---------------------------------------------
566 As this is the internal character representation, the format is
567 usually not used externally (i.e. in a file or in a data sent to a
568 process). But, it is possible to have a text externally in this
569 format (i.e. by encoding by the coding system `emacs-mule').
571 In that case, a sequence of one-byte codes has a slightly different
574 Firstly, all characters in eight-bit-control are represented by
575 one-byte sequences which are their 8-bit code.
577 Next, character composition data are represented by the byte
578 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
580 METHOD is 0xF0 plus one of composition method (enum
583 BYTES is 0xA0 plus the byte length of these composition data,
585 CHARS is 0xA0 plus the number of characters composed by these
588 COMPONENTs are characters of multibyte form or composition
589 rules encoded by two-byte of ASCII codes.
591 In addition, for backward compatibility, the following formats are
592 also recognized as composition data on decoding.
595 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
598 MSEQ is a multibyte form but in these special format:
599 ASCII: 0xA0 ASCII_CODE+0x80,
600 other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
601 RULE is a one byte code of the range 0xA0..0xF0 that
602 represents a composition rule.
605 enum emacs_code_class_type emacs_code_class
[256];
607 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
608 Check if a text is encoded in Emacs' internal format. If it is,
609 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
612 detect_coding_emacs_mule (src
, src_end
, multibytep
)
613 unsigned char *src
, *src_end
;
618 /* Dummy for ONE_MORE_BYTE. */
619 struct coding_system dummy_coding
;
620 struct coding_system
*coding
= &dummy_coding
;
624 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
632 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
641 if (c
== ISO_CODE_ESC
|| c
== ISO_CODE_SI
|| c
== ISO_CODE_SO
)
644 else if (c
>= 0x80 && c
< 0xA0)
647 /* Old leading code for a composite character. */
651 unsigned char *src_base
= src
- 1;
654 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base
, src_end
- src_base
,
657 src
= src_base
+ bytes
;
662 return CODING_CATEGORY_MASK_EMACS_MULE
;
666 /* Record the starting position START and METHOD of one composition. */
668 #define CODING_ADD_COMPOSITION_START(coding, start, method) \
670 struct composition_data *cmp_data = coding->cmp_data; \
671 int *data = cmp_data->data + cmp_data->used; \
672 coding->cmp_data_start = cmp_data->used; \
674 data[1] = cmp_data->char_offset + start; \
675 data[3] = (int) method; \
676 cmp_data->used += 4; \
679 /* Record the ending position END of the current composition. */
681 #define CODING_ADD_COMPOSITION_END(coding, end) \
683 struct composition_data *cmp_data = coding->cmp_data; \
684 int *data = cmp_data->data + coding->cmp_data_start; \
685 data[0] = cmp_data->used - coding->cmp_data_start; \
686 data[2] = cmp_data->char_offset + end; \
689 /* Record one COMPONENT (alternate character or composition rule). */
691 #define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
693 coding->cmp_data->data[coding->cmp_data->used++] = component; \
694 if (coding->cmp_data->used - coding->cmp_data_start \
695 == COMPOSITION_DATA_MAX_BUNCH_LENGTH) \
697 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
698 coding->composing = COMPOSITION_NO; \
703 /* Get one byte from a data pointed by SRC and increment SRC. If SRC
704 is not less than SRC_END, return -1 without incrementing Src. */
706 #define SAFE_ONE_MORE_BYTE() (src >= src_end ? -1 : *src++)
709 /* Decode a character represented as a component of composition
710 sequence of Emacs 20 style at SRC. Set C to that character, store
711 its multibyte form sequence at P, and set P to the end of that
712 sequence. If no valid character is found, set C to -1. */
714 #define DECODE_EMACS_MULE_COMPOSITION_CHAR(c, p) \
718 c = SAFE_ONE_MORE_BYTE (); \
721 if (CHAR_HEAD_P (c)) \
723 else if (c == 0xA0) \
725 c = SAFE_ONE_MORE_BYTE (); \
734 else if (BASE_LEADING_CODE_P (c - 0x20)) \
736 unsigned char *p0 = p; \
740 bytes = BYTES_BY_CHAR_HEAD (c); \
743 c = SAFE_ONE_MORE_BYTE (); \
748 if (UNIBYTE_STR_AS_MULTIBYTE_P (p0, p - p0, bytes)) \
749 c = STRING_CHAR (p0, bytes); \
758 /* Decode a composition rule represented as a component of composition
759 sequence of Emacs 20 style at SRC. Set C to the rule. If not
760 valid rule is found, set C to -1. */
762 #define DECODE_EMACS_MULE_COMPOSITION_RULE(c) \
764 c = SAFE_ONE_MORE_BYTE (); \
766 if (c < 0 || c >= 81) \
770 gref = c / 9, nref = c % 9; \
771 c = COMPOSITION_ENCODE_RULE (gref, nref); \
776 /* Decode composition sequence encoded by `emacs-mule' at the source
777 pointed by SRC. SRC_END is the end of source. Store information
778 of the composition in CODING->cmp_data.
780 For backward compatibility, decode also a composition sequence of
781 Emacs 20 style. In that case, the composition sequence contains
782 characters that should be extracted into a buffer or string. Store
783 those characters at *DESTINATION in multibyte form.
785 If we encounter an invalid byte sequence, return 0.
786 If we encounter an insufficient source or destination, or
787 insufficient space in CODING->cmp_data, return 1.
788 Otherwise, return consumed bytes in the source.
792 decode_composition_emacs_mule (coding
, src
, src_end
,
793 destination
, dst_end
, dst_bytes
)
794 struct coding_system
*coding
;
795 unsigned char *src
, *src_end
, **destination
, *dst_end
;
798 unsigned char *dst
= *destination
;
799 int method
, data_len
, nchars
;
800 unsigned char *src_base
= src
++;
801 /* Store components of composition. */
802 int component
[COMPOSITION_DATA_MAX_BUNCH_LENGTH
];
804 /* Store multibyte form of characters to be composed. This is for
805 Emacs 20 style composition sequence. */
806 unsigned char buf
[MAX_COMPOSITION_COMPONENTS
* MAX_MULTIBYTE_LENGTH
];
807 unsigned char *bufp
= buf
;
808 int c
, i
, gref
, nref
;
810 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
811 >= COMPOSITION_DATA_SIZE
)
813 coding
->result
= CODING_FINISH_INSUFFICIENT_CMP
;
818 if (c
- 0xF0 >= COMPOSITION_RELATIVE
819 && c
- 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS
)
824 with_rule
= (method
== COMPOSITION_WITH_RULE
825 || method
== COMPOSITION_WITH_RULE_ALTCHARS
);
829 || src_base
+ data_len
> src_end
)
835 for (ncomponent
= 0; src
< src_base
+ data_len
; ncomponent
++)
837 /* If it is longer than this, it can't be valid. */
838 if (ncomponent
>= COMPOSITION_DATA_MAX_BUNCH_LENGTH
)
841 if (ncomponent
% 2 && with_rule
)
843 ONE_MORE_BYTE (gref
);
845 ONE_MORE_BYTE (nref
);
847 c
= COMPOSITION_ENCODE_RULE (gref
, nref
);
852 if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
))
853 c
= STRING_CHAR (src
, bytes
);
858 component
[ncomponent
] = c
;
863 /* This may be an old Emacs 20 style format. See the comment at
864 the section 2 of this file. */
865 while (src
< src_end
&& !CHAR_HEAD_P (*src
)) src
++;
867 && !(coding
->mode
& CODING_MODE_LAST_BLOCK
))
868 goto label_end_of_loop
;
874 method
= COMPOSITION_RELATIVE
;
875 for (ncomponent
= 0; ncomponent
< MAX_COMPOSITION_COMPONENTS
;)
877 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
880 component
[ncomponent
++] = c
;
888 method
= COMPOSITION_WITH_RULE
;
890 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
895 ncomponent
< MAX_COMPOSITION_COMPONENTS
* 2 - 1;)
897 DECODE_EMACS_MULE_COMPOSITION_RULE (c
);
900 component
[ncomponent
++] = c
;
901 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
904 component
[ncomponent
++] = c
;
908 nchars
= (ncomponent
+ 1) / 2;
914 if (buf
== bufp
|| dst
+ (bufp
- buf
) <= (dst_bytes
? dst_end
: src
))
916 CODING_ADD_COMPOSITION_START (coding
, coding
->produced_char
, method
);
917 for (i
= 0; i
< ncomponent
; i
++)
918 CODING_ADD_COMPOSITION_COMPONENT (coding
, component
[i
]);
919 CODING_ADD_COMPOSITION_END (coding
, coding
->produced_char
+ nchars
);
922 unsigned char *p
= buf
;
923 EMIT_BYTES (p
, bufp
);
924 *destination
+= bufp
- buf
;
925 coding
->produced_char
+= nchars
;
927 return (src
- src_base
);
933 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
936 decode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
937 struct coding_system
*coding
;
938 unsigned char *source
, *destination
;
939 int src_bytes
, dst_bytes
;
941 unsigned char *src
= source
;
942 unsigned char *src_end
= source
+ src_bytes
;
943 unsigned char *dst
= destination
;
944 unsigned char *dst_end
= destination
+ dst_bytes
;
945 /* SRC_BASE remembers the start position in source in each loop.
946 The loop will be exited when there's not enough source code, or
947 when there's not enough destination area to produce a
949 unsigned char *src_base
;
951 coding
->produced_char
= 0;
952 while ((src_base
= src
) < src_end
)
954 unsigned char tmp
[MAX_MULTIBYTE_LENGTH
], *p
;
961 if (coding
->eol_type
== CODING_EOL_CR
)
963 else if (coding
->eol_type
== CODING_EOL_CRLF
)
973 coding
->produced_char
++;
976 else if (*src
== '\n')
978 if ((coding
->eol_type
== CODING_EOL_CR
979 || coding
->eol_type
== CODING_EOL_CRLF
)
980 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
982 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
983 goto label_end_of_loop
;
986 coding
->produced_char
++;
989 else if (*src
== 0x80 && coding
->cmp_data
)
991 /* Start of composition data. */
992 int consumed
= decode_composition_emacs_mule (coding
, src
, src_end
,
996 goto label_end_of_loop
;
997 else if (consumed
> 0)
1002 bytes
= CHAR_STRING (*src
, tmp
);
1006 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
))
1013 bytes
= CHAR_STRING (*src
, tmp
);
1017 if (dst
+ bytes
>= (dst_bytes
? dst_end
: src
))
1019 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
1022 while (bytes
--) *dst
++ = *p
++;
1023 coding
->produced_char
++;
1026 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
1027 coding
->produced
= dst
- destination
;
1031 /* Encode composition data stored at DATA into a special byte sequence
1032 starting by 0x80. Update CODING->cmp_data_start and maybe
1033 CODING->cmp_data for the next call. */
1035 #define ENCODE_COMPOSITION_EMACS_MULE(coding, data) \
1037 unsigned char buf[1024], *p0 = buf, *p; \
1038 int len = data[0]; \
1042 buf[1] = 0xF0 + data[3]; /* METHOD */ \
1043 buf[3] = 0xA0 + (data[2] - data[1]); /* COMPOSED-CHARS */ \
1045 if (data[3] == COMPOSITION_WITH_RULE \
1046 || data[3] == COMPOSITION_WITH_RULE_ALTCHARS) \
1048 p += CHAR_STRING (data[4], p); \
1049 for (i = 5; i < len; i += 2) \
1052 COMPOSITION_DECODE_RULE (data[i], gref, nref); \
1053 *p++ = 0x20 + gref; \
1054 *p++ = 0x20 + nref; \
1055 p += CHAR_STRING (data[i + 1], p); \
1060 for (i = 4; i < len; i++) \
1061 p += CHAR_STRING (data[i], p); \
1063 buf[2] = 0xA0 + (p - buf); /* COMPONENTS-BYTES */ \
1065 if (dst + (p - buf) + 4 > (dst_bytes ? dst_end : src)) \
1067 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
1068 goto label_end_of_loop; \
1072 coding->cmp_data_start += data[0]; \
1073 if (coding->cmp_data_start == coding->cmp_data->used \
1074 && coding->cmp_data->next) \
1076 coding->cmp_data = coding->cmp_data->next; \
1077 coding->cmp_data_start = 0; \
1082 static void encode_eol
P_ ((struct coding_system
*, const unsigned char *,
1083 unsigned char *, int, int));
1086 encode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
1087 struct coding_system
*coding
;
1088 unsigned char *source
, *destination
;
1089 int src_bytes
, dst_bytes
;
1091 unsigned char *src
= source
;
1092 unsigned char *src_end
= source
+ src_bytes
;
1093 unsigned char *dst
= destination
;
1094 unsigned char *dst_end
= destination
+ dst_bytes
;
1095 unsigned char *src_base
;
1100 Lisp_Object translation_table
;
1102 translation_table
= Qnil
;
1104 /* Optimization for the case that there's no composition. */
1105 if (!coding
->cmp_data
|| coding
->cmp_data
->used
== 0)
1107 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
1111 char_offset
= coding
->cmp_data
->char_offset
;
1112 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1117 /* If SRC starts a composition, encode the information about the
1118 composition in advance. */
1119 if (coding
->cmp_data_start
< coding
->cmp_data
->used
1120 && char_offset
+ coding
->consumed_char
== data
[1])
1122 ENCODE_COMPOSITION_EMACS_MULE (coding
, data
);
1123 char_offset
= coding
->cmp_data
->char_offset
;
1124 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1128 if (c
== '\n' && (coding
->eol_type
== CODING_EOL_CRLF
1129 || coding
->eol_type
== CODING_EOL_CR
))
1131 if (coding
->eol_type
== CODING_EOL_CRLF
)
1132 EMIT_TWO_BYTES ('\r', c
);
1134 EMIT_ONE_BYTE ('\r');
1136 else if (SINGLE_BYTE_CHAR_P (c
))
1139 EMIT_BYTES (src_base
, src
);
1140 coding
->consumed_char
++;
1143 coding
->consumed
= src_base
- source
;
1144 coding
->produced
= coding
->produced_char
= dst
- destination
;
1149 /*** 3. ISO2022 handlers ***/
1151 /* The following note describes the coding system ISO2022 briefly.
1152 Since the intention of this note is to help understand the
1153 functions in this file, some parts are NOT ACCURATE or are OVERLY
1154 SIMPLIFIED. For thorough understanding, please refer to the
1155 original document of ISO2022. This is equivalent to the standard
1156 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
1158 ISO2022 provides many mechanisms to encode several character sets
1159 in 7-bit and 8-bit environments. For 7-bit environments, all text
1160 is encoded using bytes less than 128. This may make the encoded
1161 text a little bit longer, but the text passes more easily through
1162 several types of gateway, some of which strip off the MSB (Most
1165 There are two kinds of character sets: control character sets and
1166 graphic character sets. The former contain control characters such
1167 as `newline' and `escape' to provide control functions (control
1168 functions are also provided by escape sequences). The latter
1169 contain graphic characters such as 'A' and '-'. Emacs recognizes
1170 two control character sets and many graphic character sets.
1172 Graphic character sets are classified into one of the following
1173 four classes, according to the number of bytes (DIMENSION) and
1174 number of characters in one dimension (CHARS) of the set:
1175 - DIMENSION1_CHARS94
1176 - DIMENSION1_CHARS96
1177 - DIMENSION2_CHARS94
1178 - DIMENSION2_CHARS96
1180 In addition, each character set is assigned an identification tag,
1181 unique for each set, called the "final character" (denoted as <F>
1182 hereafter). The <F> of each character set is decided by ECMA(*)
1183 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1184 (0x30..0x3F are for private use only).
1186 Note (*): ECMA = European Computer Manufacturers Association
1188 Here are examples of graphic character sets [NAME(<F>)]:
1189 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1190 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1191 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1192 o DIMENSION2_CHARS96 -- none for the moment
1194 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
1195 C0 [0x00..0x1F] -- control character plane 0
1196 GL [0x20..0x7F] -- graphic character plane 0
1197 C1 [0x80..0x9F] -- control character plane 1
1198 GR [0xA0..0xFF] -- graphic character plane 1
1200 A control character set is directly designated and invoked to C0 or
1201 C1 by an escape sequence. The most common case is that:
1202 - ISO646's control character set is designated/invoked to C0, and
1203 - ISO6429's control character set is designated/invoked to C1,
1204 and usually these designations/invocations are omitted in encoded
1205 text. In a 7-bit environment, only C0 can be used, and a control
1206 character for C1 is encoded by an appropriate escape sequence to
1207 fit into the environment. All control characters for C1 are
1208 defined to have corresponding escape sequences.
1210 A graphic character set is at first designated to one of four
1211 graphic registers (G0 through G3), then these graphic registers are
1212 invoked to GL or GR. These designations and invocations can be
1213 done independently. The most common case is that G0 is invoked to
1214 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
1215 these invocations and designations are omitted in encoded text.
1216 In a 7-bit environment, only GL can be used.
1218 When a graphic character set of CHARS94 is invoked to GL, codes
1219 0x20 and 0x7F of the GL area work as control characters SPACE and
1220 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
1223 There are two ways of invocation: locking-shift and single-shift.
1224 With locking-shift, the invocation lasts until the next different
1225 invocation, whereas with single-shift, the invocation affects the
1226 following character only and doesn't affect the locking-shift
1227 state. Invocations are done by the following control characters or
1230 ----------------------------------------------------------------------
1231 abbrev function cntrl escape seq description
1232 ----------------------------------------------------------------------
1233 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
1234 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
1235 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
1236 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
1237 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
1238 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
1239 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
1240 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
1241 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
1242 ----------------------------------------------------------------------
1243 (*) These are not used by any known coding system.
1245 Control characters for these functions are defined by macros
1246 ISO_CODE_XXX in `coding.h'.
1248 Designations are done by the following escape sequences:
1249 ----------------------------------------------------------------------
1250 escape sequence description
1251 ----------------------------------------------------------------------
1252 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
1253 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
1254 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
1255 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
1256 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
1257 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
1258 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
1259 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
1260 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
1261 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
1262 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
1263 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
1264 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
1265 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
1266 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
1267 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
1268 ----------------------------------------------------------------------
1270 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
1271 of dimension 1, chars 94, and final character <F>, etc...
1273 Note (*): Although these designations are not allowed in ISO2022,
1274 Emacs accepts them on decoding, and produces them on encoding
1275 CHARS96 character sets in a coding system which is characterized as
1276 7-bit environment, non-locking-shift, and non-single-shift.
1278 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
1279 '(' can be omitted. We refer to this as "short-form" hereafter.
1281 Now you may notice that there are a lot of ways of encoding the
1282 same multilingual text in ISO2022. Actually, there exist many
1283 coding systems such as Compound Text (used in X11's inter client
1284 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
1285 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
1286 localized platforms), and all of these are variants of ISO2022.
1288 In addition to the above, Emacs handles two more kinds of escape
1289 sequences: ISO6429's direction specification and Emacs' private
1290 sequence for specifying character composition.
1292 ISO6429's direction specification takes the following form:
1293 o CSI ']' -- end of the current direction
1294 o CSI '0' ']' -- end of the current direction
1295 o CSI '1' ']' -- start of left-to-right text
1296 o CSI '2' ']' -- start of right-to-left text
1297 The control character CSI (0x9B: control sequence introducer) is
1298 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
1300 Character composition specification takes the following form:
1301 o ESC '0' -- start relative composition
1302 o ESC '1' -- end composition
1303 o ESC '2' -- start rule-base composition (*)
1304 o ESC '3' -- start relative composition with alternate chars (**)
1305 o ESC '4' -- start rule-base composition with alternate chars (**)
1306 Since these are not standard escape sequences of any ISO standard,
1307 the use of them with these meanings is restricted to Emacs only.
1309 (*) This form is used only in Emacs 20.5 and older versions,
1310 but the newer versions can safely decode it.
1311 (**) This form is used only in Emacs 21.1 and newer versions,
1312 and the older versions can't decode it.
1314 Here's a list of example usages of these composition escape
1315 sequences (categorized by `enum composition_method').
1317 COMPOSITION_RELATIVE:
1318 ESC 0 CHAR [ CHAR ] ESC 1
1319 COMPOSITION_WITH_RULE:
1320 ESC 2 CHAR [ RULE CHAR ] ESC 1
1321 COMPOSITION_WITH_ALTCHARS:
1322 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
1323 COMPOSITION_WITH_RULE_ALTCHARS:
1324 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
1326 enum iso_code_class_type iso_code_class
[256];
1328 #define CHARSET_OK(idx, charset, c) \
1329 (coding_system_table[idx] \
1330 && (charset == CHARSET_ASCII \
1331 || (safe_chars = coding_safe_chars (coding_system_table[idx]->symbol), \
1332 CODING_SAFE_CHAR_P (safe_chars, c))) \
1333 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
1335 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
1337 #define SHIFT_OUT_OK(idx) \
1338 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1340 #define COMPOSITION_OK(idx) \
1341 (coding_system_table[idx]->composing != COMPOSITION_DISABLED)
1343 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
1344 Check if a text is encoded in ISO2022. If it is, return an
1345 integer in which appropriate flag bits any of:
1346 CODING_CATEGORY_MASK_ISO_7
1347 CODING_CATEGORY_MASK_ISO_7_TIGHT
1348 CODING_CATEGORY_MASK_ISO_8_1
1349 CODING_CATEGORY_MASK_ISO_8_2
1350 CODING_CATEGORY_MASK_ISO_7_ELSE
1351 CODING_CATEGORY_MASK_ISO_8_ELSE
1352 are set. If a code which should never appear in ISO2022 is found,
1356 detect_coding_iso2022 (src
, src_end
, multibytep
)
1357 unsigned char *src
, *src_end
;
1360 int mask
= CODING_CATEGORY_MASK_ISO
;
1362 int reg
[4], shift_out
= 0, single_shifting
= 0;
1364 /* Dummy for ONE_MORE_BYTE. */
1365 struct coding_system dummy_coding
;
1366 struct coding_system
*coding
= &dummy_coding
;
1367 Lisp_Object safe_chars
;
1369 reg
[0] = CHARSET_ASCII
, reg
[1] = reg
[2] = reg
[3] = -1;
1370 while (mask
&& src
< src_end
)
1372 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1377 if (inhibit_iso_escape_detection
)
1379 single_shifting
= 0;
1380 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1381 if (c
>= '(' && c
<= '/')
1383 /* Designation sequence for a charset of dimension 1. */
1384 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
1385 if (c1
< ' ' || c1
>= 0x80
1386 || (charset
= iso_charset_table
[0][c
>= ','][c1
]) < 0)
1387 /* Invalid designation sequence. Just ignore. */
1389 reg
[(c
- '(') % 4] = charset
;
1393 /* Designation sequence for a charset of dimension 2. */
1394 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1395 if (c
>= '@' && c
<= 'B')
1396 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
1397 reg
[0] = charset
= iso_charset_table
[1][0][c
];
1398 else if (c
>= '(' && c
<= '/')
1400 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
1401 if (c1
< ' ' || c1
>= 0x80
1402 || (charset
= iso_charset_table
[1][c
>= ','][c1
]) < 0)
1403 /* Invalid designation sequence. Just ignore. */
1405 reg
[(c
- '(') % 4] = charset
;
1408 /* Invalid designation sequence. Just ignore. */
1411 else if (c
== 'N' || c
== 'O')
1413 /* ESC <Fe> for SS2 or SS3. */
1414 mask
&= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1417 else if (c
>= '0' && c
<= '4')
1419 /* ESC <Fp> for start/end composition. */
1420 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7
))
1421 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1423 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1424 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
))
1425 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1427 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1428 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_1
))
1429 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1431 mask
&= ~CODING_CATEGORY_MASK_ISO_8_1
;
1432 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_2
))
1433 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1435 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1436 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
))
1437 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1439 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1440 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
))
1441 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1443 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1447 /* Invalid escape sequence. Just ignore. */
1450 /* We found a valid designation sequence for CHARSET. */
1451 mask
&= ~CODING_CATEGORY_MASK_ISO_8BIT
;
1452 c
= MAKE_CHAR (charset
, 0, 0);
1453 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7
, charset
, c
))
1454 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1456 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1457 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
, charset
, c
))
1458 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1460 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1461 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
, charset
, c
))
1462 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1464 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1465 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
, charset
, c
))
1466 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1468 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1472 if (inhibit_iso_escape_detection
)
1474 single_shifting
= 0;
1477 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
)
1478 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
)))
1480 /* Locking shift out. */
1481 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1482 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1487 if (inhibit_iso_escape_detection
)
1489 single_shifting
= 0;
1492 /* Locking shift in. */
1493 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1494 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1499 single_shifting
= 0;
1503 int newmask
= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1505 if (inhibit_iso_escape_detection
)
1507 if (c
!= ISO_CODE_CSI
)
1509 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1510 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1511 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1512 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1513 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1514 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1515 single_shifting
= 1;
1517 if (VECTORP (Vlatin_extra_code_table
)
1518 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1520 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1521 & CODING_FLAG_ISO_LATIN_EXTRA
)
1522 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1523 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1524 & CODING_FLAG_ISO_LATIN_EXTRA
)
1525 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1528 mask_found
|= newmask
;
1535 single_shifting
= 0;
1540 single_shifting
= 0;
1541 if (VECTORP (Vlatin_extra_code_table
)
1542 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1546 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1547 & CODING_FLAG_ISO_LATIN_EXTRA
)
1548 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1549 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1550 & CODING_FLAG_ISO_LATIN_EXTRA
)
1551 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1553 mask_found
|= newmask
;
1560 mask
&= ~(CODING_CATEGORY_MASK_ISO_7BIT
1561 | CODING_CATEGORY_MASK_ISO_7_ELSE
);
1562 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1563 /* Check the length of succeeding codes of the range
1564 0xA0..0FF. If the byte length is odd, we exclude
1565 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1566 when we are not single shifting. */
1567 if (!single_shifting
1568 && mask
& CODING_CATEGORY_MASK_ISO_8_2
)
1573 while (src
< src_end
)
1575 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1581 if (i
& 1 && src
< src_end
)
1582 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1584 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1586 /* This means that we have read one extra byte. */
1594 return (mask
& mask_found
);
1597 /* Decode a character of which charset is CHARSET, the 1st position
1598 code is C1, the 2nd position code is C2, and return the decoded
1599 character code. If the variable `translation_table' is non-nil,
1600 returned the translated code. */
1602 #define DECODE_ISO_CHARACTER(charset, c1, c2) \
1603 (NILP (translation_table) \
1604 ? MAKE_CHAR (charset, c1, c2) \
1605 : translate_char (translation_table, -1, charset, c1, c2))
1607 /* Set designation state into CODING. */
1608 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1612 if (final_char < '0' || final_char >= 128) \
1613 goto label_invalid_code; \
1614 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1615 make_number (chars), \
1616 make_number (final_char)); \
1617 c = MAKE_CHAR (charset, 0, 0); \
1619 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
1620 || CODING_SAFE_CHAR_P (safe_chars, c))) \
1622 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1624 && charset == CHARSET_ASCII) \
1626 /* We should insert this designation sequence as is so \
1627 that it is surely written back to a file. */ \
1628 coding->spec.iso2022.last_invalid_designation_register = -1; \
1629 goto label_invalid_code; \
1631 coding->spec.iso2022.last_invalid_designation_register = -1; \
1632 if ((coding->mode & CODING_MODE_DIRECTION) \
1633 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1634 charset = CHARSET_REVERSE_CHARSET (charset); \
1635 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1639 coding->spec.iso2022.last_invalid_designation_register = reg; \
1640 goto label_invalid_code; \
1644 /* Allocate a memory block for storing information about compositions.
1645 The block is chained to the already allocated blocks. */
1648 coding_allocate_composition_data (coding
, char_offset
)
1649 struct coding_system
*coding
;
1652 struct composition_data
*cmp_data
1653 = (struct composition_data
*) xmalloc (sizeof *cmp_data
);
1655 cmp_data
->char_offset
= char_offset
;
1657 cmp_data
->prev
= coding
->cmp_data
;
1658 cmp_data
->next
= NULL
;
1659 if (coding
->cmp_data
)
1660 coding
->cmp_data
->next
= cmp_data
;
1661 coding
->cmp_data
= cmp_data
;
1662 coding
->cmp_data_start
= 0;
1665 /* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4.
1666 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1
1667 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1
1668 ESC 3 : altchar composition : ESC 3 ALT ... ESC 0 CHAR ... ESC 1
1669 ESC 4 : alt&rule composition : ESC 4 ALT RULE .. ALT ESC 0 CHAR ... ESC 1
1672 #define DECODE_COMPOSITION_START(c1) \
1674 if (coding->composing == COMPOSITION_DISABLED) \
1676 *dst++ = ISO_CODE_ESC; \
1677 *dst++ = c1 & 0x7f; \
1678 coding->produced_char += 2; \
1680 else if (!COMPOSING_P (coding)) \
1682 /* This is surely the start of a composition. We must be sure \
1683 that coding->cmp_data has enough space to store the \
1684 information about the composition. If not, terminate the \
1685 current decoding loop, allocate one more memory block for \
1686 coding->cmp_data in the caller, then start the decoding \
1687 loop again. We can't allocate memory here directly because \
1688 it may cause buffer/string relocation. */ \
1689 if (!coding->cmp_data \
1690 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1691 >= COMPOSITION_DATA_SIZE)) \
1693 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1694 goto label_end_of_loop; \
1696 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1697 : c1 == '2' ? COMPOSITION_WITH_RULE \
1698 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1699 : COMPOSITION_WITH_RULE_ALTCHARS); \
1700 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1701 coding->composing); \
1702 coding->composition_rule_follows = 0; \
1706 /* We are already handling a composition. If the method is \
1707 the following two, the codes following the current escape \
1708 sequence are actual characters stored in a buffer. */ \
1709 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1710 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1712 coding->composing = COMPOSITION_RELATIVE; \
1713 coding->composition_rule_follows = 0; \
1718 /* Handle composition end sequence ESC 1. */
1720 #define DECODE_COMPOSITION_END(c1) \
1722 if (! COMPOSING_P (coding)) \
1724 *dst++ = ISO_CODE_ESC; \
1726 coding->produced_char += 2; \
1730 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1731 coding->composing = COMPOSITION_NO; \
1735 /* Decode a composition rule from the byte C1 (and maybe one more byte
1736 from SRC) and store one encoded composition rule in
1737 coding->cmp_data. */
1739 #define DECODE_COMPOSITION_RULE(c1) \
1743 if (c1 < 81) /* old format (before ver.21) */ \
1745 int gref = (c1) / 9; \
1746 int nref = (c1) % 9; \
1747 if (gref == 4) gref = 10; \
1748 if (nref == 4) nref = 10; \
1749 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1751 else if (c1 < 93) /* new format (after ver.21) */ \
1753 ONE_MORE_BYTE (c2); \
1754 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1756 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1757 coding->composition_rule_follows = 0; \
1761 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1764 decode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
1765 struct coding_system
*coding
;
1766 unsigned char *source
, *destination
;
1767 int src_bytes
, dst_bytes
;
1769 unsigned char *src
= source
;
1770 unsigned char *src_end
= source
+ src_bytes
;
1771 unsigned char *dst
= destination
;
1772 unsigned char *dst_end
= destination
+ dst_bytes
;
1773 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1774 int charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1775 int charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
1776 /* SRC_BASE remembers the start position in source in each loop.
1777 The loop will be exited when there's not enough source code
1778 (within macro ONE_MORE_BYTE), or when there's not enough
1779 destination area to produce a character (within macro
1781 unsigned char *src_base
;
1783 Lisp_Object translation_table
;
1784 Lisp_Object safe_chars
;
1786 safe_chars
= coding_safe_chars (coding
->symbol
);
1788 if (NILP (Venable_character_translation
))
1789 translation_table
= Qnil
;
1792 translation_table
= coding
->translation_table_for_decode
;
1793 if (NILP (translation_table
))
1794 translation_table
= Vstandard_translation_table_for_decode
;
1797 coding
->result
= CODING_FINISH_NORMAL
;
1806 /* We produce no character or one character. */
1807 switch (iso_code_class
[c1
])
1809 case ISO_0x20_or_0x7F
:
1810 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1812 DECODE_COMPOSITION_RULE (c1
);
1815 if (charset0
< 0 || CHARSET_CHARS (charset0
) == 94)
1817 /* This is SPACE or DEL. */
1818 charset
= CHARSET_ASCII
;
1821 /* This is a graphic character, we fall down ... */
1823 case ISO_graphic_plane_0
:
1824 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1826 DECODE_COMPOSITION_RULE (c1
);
1832 case ISO_0xA0_or_0xFF
:
1833 if (charset1
< 0 || CHARSET_CHARS (charset1
) == 94
1834 || coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
1835 goto label_invalid_code
;
1836 /* This is a graphic character, we fall down ... */
1838 case ISO_graphic_plane_1
:
1840 goto label_invalid_code
;
1845 if (COMPOSING_P (coding
))
1846 DECODE_COMPOSITION_END ('1');
1848 /* All ISO2022 control characters in this class have the
1849 same representation in Emacs internal format. */
1851 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
1852 && (coding
->eol_type
== CODING_EOL_CR
1853 || coding
->eol_type
== CODING_EOL_CRLF
))
1855 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
1856 goto label_end_of_loop
;
1858 charset
= CHARSET_ASCII
;
1862 if (COMPOSING_P (coding
))
1863 DECODE_COMPOSITION_END ('1');
1864 goto label_invalid_code
;
1866 case ISO_carriage_return
:
1867 if (COMPOSING_P (coding
))
1868 DECODE_COMPOSITION_END ('1');
1870 if (coding
->eol_type
== CODING_EOL_CR
)
1872 else if (coding
->eol_type
== CODING_EOL_CRLF
)
1875 if (c1
!= ISO_CODE_LF
)
1881 charset
= CHARSET_ASCII
;
1885 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1886 || CODING_SPEC_ISO_DESIGNATION (coding
, 1) < 0)
1887 goto label_invalid_code
;
1888 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 1;
1889 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1893 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
1894 goto label_invalid_code
;
1895 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
1896 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1899 case ISO_single_shift_2_7
:
1900 case ISO_single_shift_2
:
1901 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1902 goto label_invalid_code
;
1903 /* SS2 is handled as an escape sequence of ESC 'N' */
1905 goto label_escape_sequence
;
1907 case ISO_single_shift_3
:
1908 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1909 goto label_invalid_code
;
1910 /* SS2 is handled as an escape sequence of ESC 'O' */
1912 goto label_escape_sequence
;
1914 case ISO_control_sequence_introducer
:
1915 /* CSI is handled as an escape sequence of ESC '[' ... */
1917 goto label_escape_sequence
;
1921 label_escape_sequence
:
1922 /* Escape sequences handled by Emacs are invocation,
1923 designation, direction specification, and character
1924 composition specification. */
1927 case '&': /* revision of following character set */
1929 if (!(c1
>= '@' && c1
<= '~'))
1930 goto label_invalid_code
;
1932 if (c1
!= ISO_CODE_ESC
)
1933 goto label_invalid_code
;
1935 goto label_escape_sequence
;
1937 case '$': /* designation of 2-byte character set */
1938 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
1939 goto label_invalid_code
;
1941 if (c1
>= '@' && c1
<= 'B')
1942 { /* designation of JISX0208.1978, GB2312.1980,
1944 DECODE_DESIGNATION (0, 2, 94, c1
);
1946 else if (c1
>= 0x28 && c1
<= 0x2B)
1947 { /* designation of DIMENSION2_CHARS94 character set */
1949 DECODE_DESIGNATION (c1
- 0x28, 2, 94, c2
);
1951 else if (c1
>= 0x2C && c1
<= 0x2F)
1952 { /* designation of DIMENSION2_CHARS96 character set */
1954 DECODE_DESIGNATION (c1
- 0x2C, 2, 96, c2
);
1957 goto label_invalid_code
;
1958 /* We must update these variables now. */
1959 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1960 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
1963 case 'n': /* invocation of locking-shift-2 */
1964 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1965 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
1966 goto label_invalid_code
;
1967 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 2;
1968 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1971 case 'o': /* invocation of locking-shift-3 */
1972 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1973 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
1974 goto label_invalid_code
;
1975 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 3;
1976 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1979 case 'N': /* invocation of single-shift-2 */
1980 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
1981 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
1982 goto label_invalid_code
;
1983 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 2);
1985 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
1986 goto label_invalid_code
;
1989 case 'O': /* invocation of single-shift-3 */
1990 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
1991 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
1992 goto label_invalid_code
;
1993 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 3);
1995 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
1996 goto label_invalid_code
;
1999 case '0': case '2': case '3': case '4': /* start composition */
2000 DECODE_COMPOSITION_START (c1
);
2003 case '1': /* end composition */
2004 DECODE_COMPOSITION_END (c1
);
2007 case '[': /* specification of direction */
2008 if (coding
->flags
& CODING_FLAG_ISO_NO_DIRECTION
)
2009 goto label_invalid_code
;
2010 /* For the moment, nested direction is not supported.
2011 So, `coding->mode & CODING_MODE_DIRECTION' zero means
2012 left-to-right, and nonzero means right-to-left. */
2016 case ']': /* end of the current direction */
2017 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2019 case '0': /* end of the current direction */
2020 case '1': /* start of left-to-right direction */
2023 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2025 goto label_invalid_code
;
2028 case '2': /* start of right-to-left direction */
2031 coding
->mode
|= CODING_MODE_DIRECTION
;
2033 goto label_invalid_code
;
2037 goto label_invalid_code
;
2042 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
2043 goto label_invalid_code
;
2044 if (c1
>= 0x28 && c1
<= 0x2B)
2045 { /* designation of DIMENSION1_CHARS94 character set */
2047 DECODE_DESIGNATION (c1
- 0x28, 1, 94, c2
);
2049 else if (c1
>= 0x2C && c1
<= 0x2F)
2050 { /* designation of DIMENSION1_CHARS96 character set */
2052 DECODE_DESIGNATION (c1
- 0x2C, 1, 96, c2
);
2055 goto label_invalid_code
;
2056 /* We must update these variables now. */
2057 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2058 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
2063 /* Now we know CHARSET and 1st position code C1 of a character.
2064 Produce a multibyte sequence for that character while getting
2065 2nd position code C2 if necessary. */
2066 if (CHARSET_DIMENSION (charset
) == 2)
2069 if (c1
< 0x80 ? c2
< 0x20 || c2
>= 0x80 : c2
< 0xA0)
2070 /* C2 is not in a valid range. */
2071 goto label_invalid_code
;
2073 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
2079 if (COMPOSING_P (coding
))
2080 DECODE_COMPOSITION_END ('1');
2087 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
2088 coding
->produced
= dst
- destination
;
2093 /* ISO2022 encoding stuff. */
2096 It is not enough to say just "ISO2022" on encoding, we have to
2097 specify more details. In Emacs, each ISO2022 coding system
2098 variant has the following specifications:
2099 1. Initial designation to G0 through G3.
2100 2. Allows short-form designation?
2101 3. ASCII should be designated to G0 before control characters?
2102 4. ASCII should be designated to G0 at end of line?
2103 5. 7-bit environment or 8-bit environment?
2104 6. Use locking-shift?
2105 7. Use Single-shift?
2106 And the following two are only for Japanese:
2107 8. Use ASCII in place of JIS0201-1976-Roman?
2108 9. Use JISX0208-1983 in place of JISX0208-1978?
2109 These specifications are encoded in `coding->flags' as flag bits
2110 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
2114 /* Produce codes (escape sequence) for designating CHARSET to graphic
2115 register REG at DST, and increment DST. If <final-char> of CHARSET is
2116 '@', 'A', or 'B' and the coding system CODING allows, produce
2117 designation sequence of short-form. */
2119 #define ENCODE_DESIGNATION(charset, reg, coding) \
2121 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
2122 char *intermediate_char_94 = "()*+"; \
2123 char *intermediate_char_96 = ",-./"; \
2124 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
2126 if (revision < 255) \
2128 *dst++ = ISO_CODE_ESC; \
2130 *dst++ = '@' + revision; \
2132 *dst++ = ISO_CODE_ESC; \
2133 if (CHARSET_DIMENSION (charset) == 1) \
2135 if (CHARSET_CHARS (charset) == 94) \
2136 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2138 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2143 if (CHARSET_CHARS (charset) == 94) \
2145 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
2147 || final_char < '@' || final_char > 'B') \
2148 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2151 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2153 *dst++ = final_char; \
2154 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
2157 /* The following two macros produce codes (control character or escape
2158 sequence) for ISO2022 single-shift functions (single-shift-2 and
2161 #define ENCODE_SINGLE_SHIFT_2 \
2163 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2164 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
2166 *dst++ = ISO_CODE_SS2; \
2167 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2170 #define ENCODE_SINGLE_SHIFT_3 \
2172 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2173 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
2175 *dst++ = ISO_CODE_SS3; \
2176 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2179 /* The following four macros produce codes (control character or
2180 escape sequence) for ISO2022 locking-shift functions (shift-in,
2181 shift-out, locking-shift-2, and locking-shift-3). */
2183 #define ENCODE_SHIFT_IN \
2185 *dst++ = ISO_CODE_SI; \
2186 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
2189 #define ENCODE_SHIFT_OUT \
2191 *dst++ = ISO_CODE_SO; \
2192 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
2195 #define ENCODE_LOCKING_SHIFT_2 \
2197 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
2198 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
2201 #define ENCODE_LOCKING_SHIFT_3 \
2203 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
2204 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
2207 /* Produce codes for a DIMENSION1 character whose character set is
2208 CHARSET and whose position-code is C1. Designation and invocation
2209 sequences are also produced in advance if necessary. */
2211 #define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
2213 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2215 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2216 *dst++ = c1 & 0x7F; \
2218 *dst++ = c1 | 0x80; \
2219 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2222 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2224 *dst++ = c1 & 0x7F; \
2227 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2229 *dst++ = c1 | 0x80; \
2233 /* Since CHARSET is not yet invoked to any graphic planes, we \
2234 must invoke it, or, at first, designate it to some graphic \
2235 register. Then repeat the loop to actually produce the \
2237 dst = encode_invocation_designation (charset, coding, dst); \
2240 /* Produce codes for a DIMENSION2 character whose character set is
2241 CHARSET and whose position-codes are C1 and C2. Designation and
2242 invocation codes are also produced in advance if necessary. */
2244 #define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
2246 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2248 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2249 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
2251 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
2252 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2255 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2257 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
2260 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2262 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
2266 /* Since CHARSET is not yet invoked to any graphic planes, we \
2267 must invoke it, or, at first, designate it to some graphic \
2268 register. Then repeat the loop to actually produce the \
2270 dst = encode_invocation_designation (charset, coding, dst); \
2273 #define ENCODE_ISO_CHARACTER(c) \
2275 int charset, c1, c2; \
2277 SPLIT_CHAR (c, charset, c1, c2); \
2278 if (CHARSET_DEFINED_P (charset)) \
2280 if (CHARSET_DIMENSION (charset) == 1) \
2282 if (charset == CHARSET_ASCII \
2283 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
2284 charset = charset_latin_jisx0201; \
2285 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
2289 if (charset == charset_jisx0208 \
2290 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
2291 charset = charset_jisx0208_1978; \
2292 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
2304 /* Instead of encoding character C, produce one or two `?'s. */
2306 #define ENCODE_UNSAFE_CHARACTER(c) \
2308 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
2309 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
2310 ENCODE_ISO_CHARACTER (CODING_INHIBIT_CHARACTER_SUBSTITUTION); \
2314 /* Produce designation and invocation codes at a place pointed by DST
2315 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
2319 encode_invocation_designation (charset
, coding
, dst
)
2321 struct coding_system
*coding
;
2324 int reg
; /* graphic register number */
2326 /* At first, check designations. */
2327 for (reg
= 0; reg
< 4; reg
++)
2328 if (charset
== CODING_SPEC_ISO_DESIGNATION (coding
, reg
))
2333 /* CHARSET is not yet designated to any graphic registers. */
2334 /* At first check the requested designation. */
2335 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2336 if (reg
== CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
)
2337 /* Since CHARSET requests no special designation, designate it
2338 to graphic register 0. */
2341 ENCODE_DESIGNATION (charset
, reg
, coding
);
2344 if (CODING_SPEC_ISO_INVOCATION (coding
, 0) != reg
2345 && CODING_SPEC_ISO_INVOCATION (coding
, 1) != reg
)
2347 /* Since the graphic register REG is not invoked to any graphic
2348 planes, invoke it to graphic plane 0. */
2351 case 0: /* graphic register 0 */
2355 case 1: /* graphic register 1 */
2359 case 2: /* graphic register 2 */
2360 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2361 ENCODE_SINGLE_SHIFT_2
;
2363 ENCODE_LOCKING_SHIFT_2
;
2366 case 3: /* graphic register 3 */
2367 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2368 ENCODE_SINGLE_SHIFT_3
;
2370 ENCODE_LOCKING_SHIFT_3
;
2378 /* Produce 2-byte codes for encoded composition rule RULE. */
2380 #define ENCODE_COMPOSITION_RULE(rule) \
2383 COMPOSITION_DECODE_RULE (rule, gref, nref); \
2384 *dst++ = 32 + 81 + gref; \
2385 *dst++ = 32 + nref; \
2388 /* Produce codes for indicating the start of a composition sequence
2389 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
2390 which specify information about the composition. See the comment
2391 in coding.h for the format of DATA. */
2393 #define ENCODE_COMPOSITION_START(coding, data) \
2395 coding->composing = data[3]; \
2396 *dst++ = ISO_CODE_ESC; \
2397 if (coding->composing == COMPOSITION_RELATIVE) \
2401 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
2403 coding->cmp_data_index = coding->cmp_data_start + 4; \
2404 coding->composition_rule_follows = 0; \
2408 /* Produce codes for indicating the end of the current composition. */
2410 #define ENCODE_COMPOSITION_END(coding, data) \
2412 *dst++ = ISO_CODE_ESC; \
2414 coding->cmp_data_start += data[0]; \
2415 coding->composing = COMPOSITION_NO; \
2416 if (coding->cmp_data_start == coding->cmp_data->used \
2417 && coding->cmp_data->next) \
2419 coding->cmp_data = coding->cmp_data->next; \
2420 coding->cmp_data_start = 0; \
2424 /* Produce composition start sequence ESC 0. Here, this sequence
2425 doesn't mean the start of a new composition but means that we have
2426 just produced components (alternate chars and composition rules) of
2427 the composition and the actual text follows in SRC. */
2429 #define ENCODE_COMPOSITION_FAKE_START(coding) \
2431 *dst++ = ISO_CODE_ESC; \
2433 coding->composing = COMPOSITION_RELATIVE; \
2436 /* The following three macros produce codes for indicating direction
2438 #define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
2440 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
2441 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
2443 *dst++ = ISO_CODE_CSI; \
2446 #define ENCODE_DIRECTION_R2L \
2447 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
2449 #define ENCODE_DIRECTION_L2R \
2450 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
2452 /* Produce codes for designation and invocation to reset the graphic
2453 planes and registers to initial state. */
2454 #define ENCODE_RESET_PLANE_AND_REGISTER \
2457 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2459 for (reg = 0; reg < 4; reg++) \
2460 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2461 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2462 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2463 ENCODE_DESIGNATION \
2464 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
2467 /* Produce designation sequences of charsets in the line started from
2468 SRC to a place pointed by DST, and return updated DST.
2470 If the current block ends before any end-of-line, we may fail to
2471 find all the necessary designations. */
2473 static unsigned char *
2474 encode_designation_at_bol (coding
, translation_table
, src
, src_end
, dst
)
2475 struct coding_system
*coding
;
2476 Lisp_Object translation_table
;
2477 unsigned char *src
, *src_end
, *dst
;
2479 int charset
, c
, found
= 0, reg
;
2480 /* Table of charsets to be designated to each graphic register. */
2483 for (reg
= 0; reg
< 4; reg
++)
2492 charset
= CHAR_CHARSET (c
);
2493 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2494 if (reg
!= CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
&& r
[reg
] < 0)
2504 for (reg
= 0; reg
< 4; reg
++)
2506 && CODING_SPEC_ISO_DESIGNATION (coding
, reg
) != r
[reg
])
2507 ENCODE_DESIGNATION (r
[reg
], reg
, coding
);
2513 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2516 encode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
2517 struct coding_system
*coding
;
2518 unsigned char *source
, *destination
;
2519 int src_bytes
, dst_bytes
;
2521 unsigned char *src
= source
;
2522 unsigned char *src_end
= source
+ src_bytes
;
2523 unsigned char *dst
= destination
;
2524 unsigned char *dst_end
= destination
+ dst_bytes
;
2525 /* Since the maximum bytes produced by each loop is 20, we subtract 19
2526 from DST_END to assure overflow checking is necessary only at the
2528 unsigned char *adjusted_dst_end
= dst_end
- 19;
2529 /* SRC_BASE remembers the start position in source in each loop.
2530 The loop will be exited when there's not enough source text to
2531 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2532 there's not enough destination area to produce encoded codes
2533 (within macro EMIT_BYTES). */
2534 unsigned char *src_base
;
2536 Lisp_Object translation_table
;
2537 Lisp_Object safe_chars
;
2539 safe_chars
= coding_safe_chars (coding
->symbol
);
2541 if (NILP (Venable_character_translation
))
2542 translation_table
= Qnil
;
2545 translation_table
= coding
->translation_table_for_encode
;
2546 if (NILP (translation_table
))
2547 translation_table
= Vstandard_translation_table_for_encode
;
2550 coding
->consumed_char
= 0;
2556 if (dst
>= (dst_bytes
? adjusted_dst_end
: (src
- 19)))
2558 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
2562 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
2563 && CODING_SPEC_ISO_BOL (coding
))
2565 /* We have to produce designation sequences if any now. */
2566 dst
= encode_designation_at_bol (coding
, translation_table
,
2568 CODING_SPEC_ISO_BOL (coding
) = 0;
2571 /* Check composition start and end. */
2572 if (coding
->composing
!= COMPOSITION_DISABLED
2573 && coding
->cmp_data_start
< coding
->cmp_data
->used
)
2575 struct composition_data
*cmp_data
= coding
->cmp_data
;
2576 int *data
= cmp_data
->data
+ coding
->cmp_data_start
;
2577 int this_pos
= cmp_data
->char_offset
+ coding
->consumed_char
;
2579 if (coding
->composing
== COMPOSITION_RELATIVE
)
2581 if (this_pos
== data
[2])
2583 ENCODE_COMPOSITION_END (coding
, data
);
2584 cmp_data
= coding
->cmp_data
;
2585 data
= cmp_data
->data
+ coding
->cmp_data_start
;
2588 else if (COMPOSING_P (coding
))
2590 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2591 if (coding
->cmp_data_index
== coding
->cmp_data_start
+ data
[0])
2592 /* We have consumed components of the composition.
2593 What follows in SRC is the composition's base
2595 ENCODE_COMPOSITION_FAKE_START (coding
);
2598 int c
= cmp_data
->data
[coding
->cmp_data_index
++];
2599 if (coding
->composition_rule_follows
)
2601 ENCODE_COMPOSITION_RULE (c
);
2602 coding
->composition_rule_follows
= 0;
2606 if (coding
->flags
& CODING_FLAG_ISO_SAFE
2607 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2608 ENCODE_UNSAFE_CHARACTER (c
);
2610 ENCODE_ISO_CHARACTER (c
);
2611 if (coding
->composing
== COMPOSITION_WITH_RULE_ALTCHARS
)
2612 coding
->composition_rule_follows
= 1;
2617 if (!COMPOSING_P (coding
))
2619 if (this_pos
== data
[1])
2621 ENCODE_COMPOSITION_START (coding
, data
);
2629 /* Now encode the character C. */
2630 if (c
< 0x20 || c
== 0x7F)
2634 if (! (coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
2636 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2637 ENCODE_RESET_PLANE_AND_REGISTER
;
2641 /* fall down to treat '\r' as '\n' ... */
2646 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_EOL
)
2647 ENCODE_RESET_PLANE_AND_REGISTER
;
2648 if (coding
->flags
& CODING_FLAG_ISO_INIT_AT_BOL
)
2649 bcopy (coding
->spec
.iso2022
.initial_designation
,
2650 coding
->spec
.iso2022
.current_designation
,
2651 sizeof coding
->spec
.iso2022
.initial_designation
);
2652 if (coding
->eol_type
== CODING_EOL_LF
2653 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
2654 *dst
++ = ISO_CODE_LF
;
2655 else if (coding
->eol_type
== CODING_EOL_CRLF
)
2656 *dst
++ = ISO_CODE_CR
, *dst
++ = ISO_CODE_LF
;
2658 *dst
++ = ISO_CODE_CR
;
2659 CODING_SPEC_ISO_BOL (coding
) = 1;
2663 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2664 ENCODE_RESET_PLANE_AND_REGISTER
;
2668 else if (ASCII_BYTE_P (c
))
2669 ENCODE_ISO_CHARACTER (c
);
2670 else if (SINGLE_BYTE_CHAR_P (c
))
2675 else if (coding
->flags
& CODING_FLAG_ISO_SAFE
2676 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2677 ENCODE_UNSAFE_CHARACTER (c
);
2679 ENCODE_ISO_CHARACTER (c
);
2681 coding
->consumed_char
++;
2685 coding
->consumed
= src_base
- source
;
2686 coding
->produced
= coding
->produced_char
= dst
- destination
;
2690 /*** 4. SJIS and BIG5 handlers ***/
2692 /* Although SJIS and BIG5 are not ISO coding systems, they are used
2693 quite widely. So, for the moment, Emacs supports them in the bare
2694 C code. But, in the future, they may be supported only by CCL. */
2696 /* SJIS is a coding system encoding three character sets: ASCII, right
2697 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2698 as is. A character of charset katakana-jisx0201 is encoded by
2699 "position-code + 0x80". A character of charset japanese-jisx0208
2700 is encoded in 2-byte but two position-codes are divided and shifted
2701 so that it fits in the range below.
2703 --- CODE RANGE of SJIS ---
2704 (character set) (range)
2706 KATAKANA-JISX0201 0xA1 .. 0xDF
2707 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
2708 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
2709 -------------------------------
2713 /* BIG5 is a coding system encoding two character sets: ASCII and
2714 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2715 character set and is encoded in two bytes.
2717 --- CODE RANGE of BIG5 ---
2718 (character set) (range)
2720 Big5 (1st byte) 0xA1 .. 0xFE
2721 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2722 --------------------------
2724 Since the number of characters in Big5 is larger than maximum
2725 characters in Emacs' charset (96x96), it can't be handled as one
2726 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2727 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2728 contains frequently used characters and the latter contains less
2729 frequently used characters. */
2731 /* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2732 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2733 C1 and C2 are the 1st and 2nd position-codes of Emacs' internal
2734 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2736 /* Number of Big5 characters which have the same code in 1st byte. */
2737 #define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2739 #define DECODE_BIG5(b1, b2, charset, c1, c2) \
2742 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2744 charset = charset_big5_1; \
2747 charset = charset_big5_2; \
2748 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2750 c1 = temp / (0xFF - 0xA1) + 0x21; \
2751 c2 = temp % (0xFF - 0xA1) + 0x21; \
2754 #define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2756 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2757 if (charset == charset_big5_2) \
2758 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2759 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2760 b2 = temp % BIG5_SAME_ROW; \
2761 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2764 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2765 Check if a text is encoded in SJIS. If it is, return
2766 CODING_CATEGORY_MASK_SJIS, else return 0. */
2769 detect_coding_sjis (src
, src_end
, multibytep
)
2770 unsigned char *src
, *src_end
;
2774 /* Dummy for ONE_MORE_BYTE. */
2775 struct coding_system dummy_coding
;
2776 struct coding_system
*coding
= &dummy_coding
;
2780 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2783 if (c
== 0x80 || c
== 0xA0 || c
> 0xEF)
2785 if (c
<= 0x9F || c
>= 0xE0)
2787 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2788 if (c
< 0x40 || c
== 0x7F || c
> 0xFC)
2793 return CODING_CATEGORY_MASK_SJIS
;
2796 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2797 Check if a text is encoded in BIG5. If it is, return
2798 CODING_CATEGORY_MASK_BIG5, else return 0. */
2801 detect_coding_big5 (src
, src_end
, multibytep
)
2802 unsigned char *src
, *src_end
;
2806 /* Dummy for ONE_MORE_BYTE. */
2807 struct coding_system dummy_coding
;
2808 struct coding_system
*coding
= &dummy_coding
;
2812 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2815 if (c
< 0xA1 || c
> 0xFE)
2817 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2818 if (c
< 0x40 || (c
> 0x7F && c
< 0xA1) || c
> 0xFE)
2822 return CODING_CATEGORY_MASK_BIG5
;
2825 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2826 Check if a text is encoded in UTF-8. If it is, return
2827 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2829 #define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2830 #define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2831 #define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2832 #define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2833 #define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2834 #define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2835 #define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2838 detect_coding_utf_8 (src
, src_end
, multibytep
)
2839 unsigned char *src
, *src_end
;
2843 int seq_maybe_bytes
;
2844 /* Dummy for ONE_MORE_BYTE. */
2845 struct coding_system dummy_coding
;
2846 struct coding_system
*coding
= &dummy_coding
;
2850 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2851 if (UTF_8_1_OCTET_P (c
))
2853 else if (UTF_8_2_OCTET_LEADING_P (c
))
2854 seq_maybe_bytes
= 1;
2855 else if (UTF_8_3_OCTET_LEADING_P (c
))
2856 seq_maybe_bytes
= 2;
2857 else if (UTF_8_4_OCTET_LEADING_P (c
))
2858 seq_maybe_bytes
= 3;
2859 else if (UTF_8_5_OCTET_LEADING_P (c
))
2860 seq_maybe_bytes
= 4;
2861 else if (UTF_8_6_OCTET_LEADING_P (c
))
2862 seq_maybe_bytes
= 5;
2868 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2869 if (!UTF_8_EXTRA_OCTET_P (c
))
2873 while (seq_maybe_bytes
> 0);
2877 return CODING_CATEGORY_MASK_UTF_8
;
2880 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2881 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
2882 Little Endian (otherwise). If it is, return
2883 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
2886 #define UTF_16_INVALID_P(val) \
2887 (((val) == 0xFFFE) \
2888 || ((val) == 0xFFFF))
2890 #define UTF_16_HIGH_SURROGATE_P(val) \
2891 (((val) & 0xD800) == 0xD800)
2893 #define UTF_16_LOW_SURROGATE_P(val) \
2894 (((val) & 0xDC00) == 0xDC00)
2897 detect_coding_utf_16 (src
, src_end
, multibytep
)
2898 unsigned char *src
, *src_end
;
2901 unsigned char c1
, c2
;
2902 /* Dummy for TWO_MORE_BYTES. */
2903 struct coding_system dummy_coding
;
2904 struct coding_system
*coding
= &dummy_coding
;
2906 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
2907 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2
, multibytep
);
2909 if ((c1
== 0xFF) && (c2
== 0xFE))
2910 return CODING_CATEGORY_MASK_UTF_16_LE
;
2911 else if ((c1
== 0xFE) && (c2
== 0xFF))
2912 return CODING_CATEGORY_MASK_UTF_16_BE
;
2918 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
2919 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
2922 decode_coding_sjis_big5 (coding
, source
, destination
,
2923 src_bytes
, dst_bytes
, sjis_p
)
2924 struct coding_system
*coding
;
2925 unsigned char *source
, *destination
;
2926 int src_bytes
, dst_bytes
;
2929 unsigned char *src
= source
;
2930 unsigned char *src_end
= source
+ src_bytes
;
2931 unsigned char *dst
= destination
;
2932 unsigned char *dst_end
= destination
+ dst_bytes
;
2933 /* SRC_BASE remembers the start position in source in each loop.
2934 The loop will be exited when there's not enough source code
2935 (within macro ONE_MORE_BYTE), or when there's not enough
2936 destination area to produce a character (within macro
2938 unsigned char *src_base
;
2939 Lisp_Object translation_table
;
2941 if (NILP (Venable_character_translation
))
2942 translation_table
= Qnil
;
2945 translation_table
= coding
->translation_table_for_decode
;
2946 if (NILP (translation_table
))
2947 translation_table
= Vstandard_translation_table_for_decode
;
2950 coding
->produced_char
= 0;
2953 int c
, charset
, c1
, c2
;
2960 charset
= CHARSET_ASCII
;
2965 if (coding
->eol_type
== CODING_EOL_CRLF
)
2971 /* To process C2 again, SRC is subtracted by 1. */
2974 else if (coding
->eol_type
== CODING_EOL_CR
)
2978 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
2979 && (coding
->eol_type
== CODING_EOL_CR
2980 || coding
->eol_type
== CODING_EOL_CRLF
))
2982 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
2983 goto label_end_of_loop
;
2991 if (c1
== 0x80 || c1
== 0xA0 || c1
> 0xEF)
2992 goto label_invalid_code
;
2993 if (c1
<= 0x9F || c1
>= 0xE0)
2995 /* SJIS -> JISX0208 */
2997 if (c2
< 0x40 || c2
== 0x7F || c2
> 0xFC)
2998 goto label_invalid_code
;
2999 DECODE_SJIS (c1
, c2
, c1
, c2
);
3000 charset
= charset_jisx0208
;
3003 /* SJIS -> JISX0201-Kana */
3004 charset
= charset_katakana_jisx0201
;
3009 if (c1
< 0xA0 || c1
> 0xFE)
3010 goto label_invalid_code
;
3012 if (c2
< 0x40 || (c2
> 0x7E && c2
< 0xA1) || c2
> 0xFE)
3013 goto label_invalid_code
;
3014 DECODE_BIG5 (c1
, c2
, charset
, c1
, c2
);
3018 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
3030 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3031 coding
->produced
= dst
- destination
;
3035 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
3036 This function can encode charsets `ascii', `katakana-jisx0201',
3037 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
3038 are sure that all these charsets are registered as official charset
3039 (i.e. do not have extended leading-codes). Characters of other
3040 charsets are produced without any encoding. If SJIS_P is 1, encode
3041 SJIS text, else encode BIG5 text. */
3044 encode_coding_sjis_big5 (coding
, source
, destination
,
3045 src_bytes
, dst_bytes
, sjis_p
)
3046 struct coding_system
*coding
;
3047 unsigned char *source
, *destination
;
3048 int src_bytes
, dst_bytes
;
3051 unsigned char *src
= source
;
3052 unsigned char *src_end
= source
+ src_bytes
;
3053 unsigned char *dst
= destination
;
3054 unsigned char *dst_end
= destination
+ dst_bytes
;
3055 /* SRC_BASE remembers the start position in source in each loop.
3056 The loop will be exited when there's not enough source text to
3057 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3058 there's not enough destination area to produce encoded codes
3059 (within macro EMIT_BYTES). */
3060 unsigned char *src_base
;
3061 Lisp_Object translation_table
;
3063 if (NILP (Venable_character_translation
))
3064 translation_table
= Qnil
;
3067 translation_table
= coding
->translation_table_for_encode
;
3068 if (NILP (translation_table
))
3069 translation_table
= Vstandard_translation_table_for_encode
;
3074 int c
, charset
, c1
, c2
;
3079 /* Now encode the character C. */
3080 if (SINGLE_BYTE_CHAR_P (c
))
3085 if (!(coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
3092 if (coding
->eol_type
== CODING_EOL_CRLF
)
3094 EMIT_TWO_BYTES ('\r', c
);
3097 else if (coding
->eol_type
== CODING_EOL_CR
)
3105 SPLIT_CHAR (c
, charset
, c1
, c2
);
3108 if (charset
== charset_jisx0208
3109 || charset
== charset_jisx0208_1978
)
3111 ENCODE_SJIS (c1
, c2
, c1
, c2
);
3112 EMIT_TWO_BYTES (c1
, c2
);
3114 else if (charset
== charset_katakana_jisx0201
)
3115 EMIT_ONE_BYTE (c1
| 0x80);
3116 else if (charset
== charset_latin_jisx0201
)
3119 /* There's no way other than producing the internal
3121 EMIT_BYTES (src_base
, src
);
3125 if (charset
== charset_big5_1
|| charset
== charset_big5_2
)
3127 ENCODE_BIG5 (charset
, c1
, c2
, c1
, c2
);
3128 EMIT_TWO_BYTES (c1
, c2
);
3131 /* There's no way other than producing the internal
3133 EMIT_BYTES (src_base
, src
);
3136 coding
->consumed_char
++;
3140 coding
->consumed
= src_base
- source
;
3141 coding
->produced
= coding
->produced_char
= dst
- destination
;
3145 /*** 5. CCL handlers ***/
3147 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3148 Check if a text is encoded in a coding system of which
3149 encoder/decoder are written in CCL program. If it is, return
3150 CODING_CATEGORY_MASK_CCL, else return 0. */
3153 detect_coding_ccl (src
, src_end
, multibytep
)
3154 unsigned char *src
, *src_end
;
3157 unsigned char *valid
;
3159 /* Dummy for ONE_MORE_BYTE. */
3160 struct coding_system dummy_coding
;
3161 struct coding_system
*coding
= &dummy_coding
;
3163 /* No coding system is assigned to coding-category-ccl. */
3164 if (!coding_system_table
[CODING_CATEGORY_IDX_CCL
])
3167 valid
= coding_system_table
[CODING_CATEGORY_IDX_CCL
]->spec
.ccl
.valid_codes
;
3170 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
3175 return CODING_CATEGORY_MASK_CCL
;
3179 /*** 6. End-of-line handlers ***/
3181 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
3184 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3185 struct coding_system
*coding
;
3186 unsigned char *source
, *destination
;
3187 int src_bytes
, dst_bytes
;
3189 unsigned char *src
= source
;
3190 unsigned char *dst
= destination
;
3191 unsigned char *src_end
= src
+ src_bytes
;
3192 unsigned char *dst_end
= dst
+ dst_bytes
;
3193 Lisp_Object translation_table
;
3194 /* SRC_BASE remembers the start position in source in each loop.
3195 The loop will be exited when there's not enough source code
3196 (within macro ONE_MORE_BYTE), or when there's not enough
3197 destination area to produce a character (within macro
3199 unsigned char *src_base
;
3202 translation_table
= Qnil
;
3203 switch (coding
->eol_type
)
3205 case CODING_EOL_CRLF
:
3220 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
))
3222 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3223 goto label_end_of_loop
;
3236 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3238 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3239 goto label_end_of_loop
;
3248 default: /* no need for EOL handling */
3258 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3259 coding
->produced
= dst
- destination
;
3263 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
3264 format of end-of-line according to `coding->eol_type'. It also
3265 convert multibyte form 8-bit characters to unibyte if
3266 CODING->src_multibyte is nonzero. If `coding->mode &
3267 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
3268 also means end-of-line. */
3271 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3272 struct coding_system
*coding
;
3273 const unsigned char *source
;
3274 unsigned char *destination
;
3275 int src_bytes
, dst_bytes
;
3277 const unsigned char *src
= source
;
3278 unsigned char *dst
= destination
;
3279 const unsigned char *src_end
= src
+ src_bytes
;
3280 unsigned char *dst_end
= dst
+ dst_bytes
;
3281 Lisp_Object translation_table
;
3282 /* SRC_BASE remembers the start position in source in each loop.
3283 The loop will be exited when there's not enough source text to
3284 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3285 there's not enough destination area to produce encoded codes
3286 (within macro EMIT_BYTES). */
3287 const unsigned char *src_base
;
3290 int selective_display
= coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
;
3292 translation_table
= Qnil
;
3293 if (coding
->src_multibyte
3294 && *(src_end
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3298 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
3301 if (coding
->eol_type
== CODING_EOL_CRLF
)
3303 while (src
< src_end
)
3309 else if (c
== '\n' || (c
== '\r' && selective_display
))
3310 EMIT_TWO_BYTES ('\r', '\n');
3320 if (!dst_bytes
|| src_bytes
<= dst_bytes
)
3322 safe_bcopy (src
, dst
, src_bytes
);
3328 if (coding
->src_multibyte
3329 && *(src
+ dst_bytes
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3331 safe_bcopy (src
, dst
, dst_bytes
);
3332 src_base
= src
+ dst_bytes
;
3333 dst
= destination
+ dst_bytes
;
3334 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
3336 if (coding
->eol_type
== CODING_EOL_CR
)
3338 for (tmp
= destination
; tmp
< dst
; tmp
++)
3339 if (*tmp
== '\n') *tmp
= '\r';
3341 else if (selective_display
)
3343 for (tmp
= destination
; tmp
< dst
; tmp
++)
3344 if (*tmp
== '\r') *tmp
= '\n';
3347 if (coding
->src_multibyte
)
3348 dst
= destination
+ str_as_unibyte (destination
, dst
- destination
);
3350 coding
->consumed
= src_base
- source
;
3351 coding
->produced
= dst
- destination
;
3352 coding
->produced_char
= coding
->produced
;
3356 /*** 7. C library functions ***/
3358 /* In Emacs Lisp, a coding system is represented by a Lisp symbol which
3359 has a property `coding-system'. The value of this property is a
3360 vector of length 5 (called the coding-vector). Among elements of
3361 this vector, the first (element[0]) and the fifth (element[4])
3362 carry important information for decoding/encoding. Before
3363 decoding/encoding, this information should be set in fields of a
3364 structure of type `coding_system'.
3366 The value of the property `coding-system' can be a symbol of another
3367 subsidiary coding-system. In that case, Emacs gets coding-vector
3370 `element[0]' contains information to be set in `coding->type'. The
3371 value and its meaning is as follows:
3373 0 -- coding_type_emacs_mule
3374 1 -- coding_type_sjis
3375 2 -- coding_type_iso2022
3376 3 -- coding_type_big5
3377 4 -- coding_type_ccl encoder/decoder written in CCL
3378 nil -- coding_type_no_conversion
3379 t -- coding_type_undecided (automatic conversion on decoding,
3380 no-conversion on encoding)
3382 `element[4]' contains information to be set in `coding->flags' and
3383 `coding->spec'. The meaning varies by `coding->type'.
3385 If `coding->type' is `coding_type_iso2022', element[4] is a vector
3386 of length 32 (of which the first 13 sub-elements are used now).
3387 Meanings of these sub-elements are:
3389 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
3390 If the value is an integer of valid charset, the charset is
3391 assumed to be designated to graphic register N initially.
3393 If the value is minus, it is a minus value of charset which
3394 reserves graphic register N, which means that the charset is
3395 not designated initially but should be designated to graphic
3396 register N just before encoding a character in that charset.
3398 If the value is nil, graphic register N is never used on
3401 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
3402 Each value takes t or nil. See the section ISO2022 of
3403 `coding.h' for more information.
3405 If `coding->type' is `coding_type_big5', element[4] is t to denote
3406 BIG5-ETen or nil to denote BIG5-HKU.
3408 If `coding->type' takes the other value, element[4] is ignored.
3410 Emacs Lisp's coding systems also carry information about format of
3411 end-of-line in a value of property `eol-type'. If the value is
3412 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3413 means CODING_EOL_CR. If it is not integer, it should be a vector
3414 of subsidiary coding systems of which property `eol-type' has one
3415 of the above values.
3419 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3420 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3421 is setup so that no conversion is necessary and return -1, else
3425 setup_coding_system (coding_system
, coding
)
3426 Lisp_Object coding_system
;
3427 struct coding_system
*coding
;
3429 Lisp_Object coding_spec
, coding_type
, eol_type
, plist
;
3432 /* At first, zero clear all members. */
3433 bzero (coding
, sizeof (struct coding_system
));
3435 /* Initialize some fields required for all kinds of coding systems. */
3436 coding
->symbol
= coding_system
;
3437 coding
->heading_ascii
= -1;
3438 coding
->post_read_conversion
= coding
->pre_write_conversion
= Qnil
;
3439 coding
->composing
= COMPOSITION_DISABLED
;
3440 coding
->cmp_data
= NULL
;
3442 if (NILP (coding_system
))
3443 goto label_invalid_coding_system
;
3445 coding_spec
= Fget (coding_system
, Qcoding_system
);
3447 if (!VECTORP (coding_spec
)
3448 || XVECTOR (coding_spec
)->size
!= 5
3449 || !CONSP (XVECTOR (coding_spec
)->contents
[3]))
3450 goto label_invalid_coding_system
;
3452 eol_type
= inhibit_eol_conversion
? Qnil
: Fget (coding_system
, Qeol_type
);
3453 if (VECTORP (eol_type
))
3455 coding
->eol_type
= CODING_EOL_UNDECIDED
;
3456 coding
->common_flags
= CODING_REQUIRE_DETECTION_MASK
;
3458 else if (XFASTINT (eol_type
) == 1)
3460 coding
->eol_type
= CODING_EOL_CRLF
;
3461 coding
->common_flags
3462 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3464 else if (XFASTINT (eol_type
) == 2)
3466 coding
->eol_type
= CODING_EOL_CR
;
3467 coding
->common_flags
3468 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3471 coding
->eol_type
= CODING_EOL_LF
;
3473 coding_type
= XVECTOR (coding_spec
)->contents
[0];
3474 /* Try short cut. */
3475 if (SYMBOLP (coding_type
))
3477 if (EQ (coding_type
, Qt
))
3479 coding
->type
= coding_type_undecided
;
3480 coding
->common_flags
|= CODING_REQUIRE_DETECTION_MASK
;
3483 coding
->type
= coding_type_no_conversion
;
3484 /* Initialize this member. Any thing other than
3485 CODING_CATEGORY_IDX_UTF_16_BE and
3486 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3487 special treatment in detect_eol. */
3488 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
3493 /* Get values of coding system properties:
3494 `post-read-conversion', `pre-write-conversion',
3495 `translation-table-for-decode', `translation-table-for-encode'. */
3496 plist
= XVECTOR (coding_spec
)->contents
[3];
3497 /* Pre & post conversion functions should be disabled if
3498 inhibit_eol_conversion is nonzero. This is the case that a code
3499 conversion function is called while those functions are running. */
3500 if (! inhibit_pre_post_conversion
)
3502 coding
->post_read_conversion
= Fplist_get (plist
, Qpost_read_conversion
);
3503 coding
->pre_write_conversion
= Fplist_get (plist
, Qpre_write_conversion
);
3505 val
= Fplist_get (plist
, Qtranslation_table_for_decode
);
3507 val
= Fget (val
, Qtranslation_table_for_decode
);
3508 coding
->translation_table_for_decode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3509 val
= Fplist_get (plist
, Qtranslation_table_for_encode
);
3511 val
= Fget (val
, Qtranslation_table_for_encode
);
3512 coding
->translation_table_for_encode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3513 val
= Fplist_get (plist
, Qcoding_category
);
3516 val
= Fget (val
, Qcoding_category_index
);
3518 coding
->category_idx
= XINT (val
);
3520 goto label_invalid_coding_system
;
3523 goto label_invalid_coding_system
;
3525 /* If the coding system has non-nil `composition' property, enable
3526 composition handling. */
3527 val
= Fplist_get (plist
, Qcomposition
);
3529 coding
->composing
= COMPOSITION_NO
;
3531 switch (XFASTINT (coding_type
))
3534 coding
->type
= coding_type_emacs_mule
;
3535 coding
->common_flags
3536 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3537 if (!NILP (coding
->post_read_conversion
))
3538 coding
->common_flags
|= CODING_REQUIRE_DECODING_MASK
;
3539 if (!NILP (coding
->pre_write_conversion
))
3540 coding
->common_flags
|= CODING_REQUIRE_ENCODING_MASK
;
3544 coding
->type
= coding_type_sjis
;
3545 coding
->common_flags
3546 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3550 coding
->type
= coding_type_iso2022
;
3551 coding
->common_flags
3552 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3554 Lisp_Object val
, temp
;
3556 int i
, charset
, reg_bits
= 0;
3558 val
= XVECTOR (coding_spec
)->contents
[4];
3560 if (!VECTORP (val
) || XVECTOR (val
)->size
!= 32)
3561 goto label_invalid_coding_system
;
3563 flags
= XVECTOR (val
)->contents
;
3565 = ((NILP (flags
[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM
)
3566 | (NILP (flags
[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL
)
3567 | (NILP (flags
[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL
)
3568 | (NILP (flags
[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS
)
3569 | (NILP (flags
[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT
)
3570 | (NILP (flags
[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT
)
3571 | (NILP (flags
[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN
)
3572 | (NILP (flags
[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS
)
3573 | (NILP (flags
[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION
)
3574 | (NILP (flags
[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL
)
3575 | (NILP (flags
[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
3576 | (NILP (flags
[15]) ? 0 : CODING_FLAG_ISO_SAFE
)
3577 | (NILP (flags
[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA
)
3580 /* Invoke graphic register 0 to plane 0. */
3581 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
3582 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3583 CODING_SPEC_ISO_INVOCATION (coding
, 1)
3584 = (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
? -1 : 1);
3585 /* Not single shifting at first. */
3586 CODING_SPEC_ISO_SINGLE_SHIFTING (coding
) = 0;
3587 /* Beginning of buffer should also be regarded as bol. */
3588 CODING_SPEC_ISO_BOL (coding
) = 1;
3590 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3591 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = 255;
3592 val
= Vcharset_revision_alist
;
3595 charset
= get_charset_id (Fcar_safe (XCAR (val
)));
3597 && (temp
= Fcdr_safe (XCAR (val
)), INTEGERP (temp
))
3598 && (i
= XINT (temp
), (i
>= 0 && (i
+ '@') < 128)))
3599 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = i
;
3603 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3604 FLAGS[REG] can be one of below:
3605 integer CHARSET: CHARSET occupies register I,
3606 t: designate nothing to REG initially, but can be used
3608 list of integer, nil, or t: designate the first
3609 element (if integer) to REG initially, the remaining
3610 elements (if integer) is designated to REG on request,
3611 if an element is t, REG can be used by any charsets,
3612 nil: REG is never used. */
3613 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3614 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3615 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
;
3616 for (i
= 0; i
< 4; i
++)
3618 if ((INTEGERP (flags
[i
])
3619 && (charset
= XINT (flags
[i
]), CHARSET_VALID_P (charset
)))
3620 || (charset
= get_charset_id (flags
[i
])) >= 0)
3622 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3623 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) = i
;
3625 else if (EQ (flags
[i
], Qt
))
3627 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3629 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3631 else if (CONSP (flags
[i
]))
3636 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3637 if ((INTEGERP (XCAR (tail
))
3638 && (charset
= XINT (XCAR (tail
)),
3639 CHARSET_VALID_P (charset
)))
3640 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3642 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3643 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) =i
;
3646 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3648 while (CONSP (tail
))
3650 if ((INTEGERP (XCAR (tail
))
3651 && (charset
= XINT (XCAR (tail
)),
3652 CHARSET_VALID_P (charset
)))
3653 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3654 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3656 else if (EQ (XCAR (tail
), Qt
))
3662 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3664 CODING_SPEC_ISO_DESIGNATION (coding
, i
)
3665 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
);
3668 if (reg_bits
&& ! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
3670 /* REG 1 can be used only by locking shift in 7-bit env. */
3671 if (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
3673 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
3674 /* Without any shifting, only REG 0 and 1 can be used. */
3679 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3681 if (CHARSET_DEFINED_P (charset
)
3682 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3683 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
))
3685 /* There exist some default graphic registers to be
3688 /* We had better avoid designating a charset of
3689 CHARS96 to REG 0 as far as possible. */
3690 if (CHARSET_CHARS (charset
) == 96)
3691 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3693 ? 1 : (reg_bits
& 4 ? 2 : (reg_bits
& 8 ? 3 : 0)));
3695 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3697 ? 0 : (reg_bits
& 2 ? 1 : (reg_bits
& 4 ? 2 : 3)));
3701 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3702 coding
->spec
.iso2022
.last_invalid_designation_register
= -1;
3706 coding
->type
= coding_type_big5
;
3707 coding
->common_flags
3708 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3710 = (NILP (XVECTOR (coding_spec
)->contents
[4])
3711 ? CODING_FLAG_BIG5_HKU
3712 : CODING_FLAG_BIG5_ETEN
);
3716 coding
->type
= coding_type_ccl
;
3717 coding
->common_flags
3718 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3720 val
= XVECTOR (coding_spec
)->contents
[4];
3722 || setup_ccl_program (&(coding
->spec
.ccl
.decoder
),
3724 || setup_ccl_program (&(coding
->spec
.ccl
.encoder
),
3726 goto label_invalid_coding_system
;
3728 bzero (coding
->spec
.ccl
.valid_codes
, 256);
3729 val
= Fplist_get (plist
, Qvalid_codes
);
3734 for (; CONSP (val
); val
= XCDR (val
))
3738 && XINT (this) >= 0 && XINT (this) < 256)
3739 coding
->spec
.ccl
.valid_codes
[XINT (this)] = 1;
3740 else if (CONSP (this)
3741 && INTEGERP (XCAR (this))
3742 && INTEGERP (XCDR (this)))
3744 int start
= XINT (XCAR (this));
3745 int end
= XINT (XCDR (this));
3747 if (start
>= 0 && start
<= end
&& end
< 256)
3748 while (start
<= end
)
3749 coding
->spec
.ccl
.valid_codes
[start
++] = 1;
3754 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3755 coding
->spec
.ccl
.cr_carryover
= 0;
3756 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
3760 coding
->type
= coding_type_raw_text
;
3764 goto label_invalid_coding_system
;
3768 label_invalid_coding_system
:
3769 coding
->type
= coding_type_no_conversion
;
3770 coding
->category_idx
= CODING_CATEGORY_IDX_BINARY
;
3771 coding
->common_flags
= 0;
3772 coding
->eol_type
= CODING_EOL_LF
;
3773 coding
->pre_write_conversion
= coding
->post_read_conversion
= Qnil
;
3777 /* Free memory blocks allocated for storing composition information. */
3780 coding_free_composition_data (coding
)
3781 struct coding_system
*coding
;
3783 struct composition_data
*cmp_data
= coding
->cmp_data
, *next
;
3787 /* Memory blocks are chained. At first, rewind to the first, then,
3788 free blocks one by one. */
3789 while (cmp_data
->prev
)
3790 cmp_data
= cmp_data
->prev
;
3793 next
= cmp_data
->next
;
3797 coding
->cmp_data
= NULL
;
3800 /* Set `char_offset' member of all memory blocks pointed by
3801 coding->cmp_data to POS. */
3804 coding_adjust_composition_offset (coding
, pos
)
3805 struct coding_system
*coding
;
3808 struct composition_data
*cmp_data
;
3810 for (cmp_data
= coding
->cmp_data
; cmp_data
; cmp_data
= cmp_data
->next
)
3811 cmp_data
->char_offset
= pos
;
3814 /* Setup raw-text or one of its subsidiaries in the structure
3815 coding_system CODING according to the already setup value eol_type
3816 in CODING. CODING should be setup for some coding system in
3820 setup_raw_text_coding_system (coding
)
3821 struct coding_system
*coding
;
3823 if (coding
->type
!= coding_type_raw_text
)
3825 coding
->symbol
= Qraw_text
;
3826 coding
->type
= coding_type_raw_text
;
3827 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
3829 Lisp_Object subsidiaries
;
3830 subsidiaries
= Fget (Qraw_text
, Qeol_type
);
3832 if (VECTORP (subsidiaries
)
3833 && XVECTOR (subsidiaries
)->size
== 3)
3835 = XVECTOR (subsidiaries
)->contents
[coding
->eol_type
];
3837 setup_coding_system (coding
->symbol
, coding
);
3842 /* Emacs has a mechanism to automatically detect a coding system if it
3843 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3844 it's impossible to distinguish some coding systems accurately
3845 because they use the same range of codes. So, at first, coding
3846 systems are categorized into 7, those are:
3848 o coding-category-emacs-mule
3850 The category for a coding system which has the same code range
3851 as Emacs' internal format. Assigned the coding-system (Lisp
3852 symbol) `emacs-mule' by default.
3854 o coding-category-sjis
3856 The category for a coding system which has the same code range
3857 as SJIS. Assigned the coding-system (Lisp
3858 symbol) `japanese-shift-jis' by default.
3860 o coding-category-iso-7
3862 The category for a coding system which has the same code range
3863 as ISO2022 of 7-bit environment. This doesn't use any locking
3864 shift and single shift functions. This can encode/decode all
3865 charsets. Assigned the coding-system (Lisp symbol)
3866 `iso-2022-7bit' by default.
3868 o coding-category-iso-7-tight
3870 Same as coding-category-iso-7 except that this can
3871 encode/decode only the specified charsets.
3873 o coding-category-iso-8-1
3875 The category for a coding system which has the same code range
3876 as ISO2022 of 8-bit environment and graphic plane 1 used only
3877 for DIMENSION1 charset. This doesn't use any locking shift
3878 and single shift functions. Assigned the coding-system (Lisp
3879 symbol) `iso-latin-1' by default.
3881 o coding-category-iso-8-2
3883 The category for a coding system which has the same code range
3884 as ISO2022 of 8-bit environment and graphic plane 1 used only
3885 for DIMENSION2 charset. This doesn't use any locking shift
3886 and single shift functions. Assigned the coding-system (Lisp
3887 symbol) `japanese-iso-8bit' by default.
3889 o coding-category-iso-7-else
3891 The category for a coding system which has the same code range
3892 as ISO2022 of 7-bit environment but uses locking shift or
3893 single shift functions. Assigned the coding-system (Lisp
3894 symbol) `iso-2022-7bit-lock' by default.
3896 o coding-category-iso-8-else
3898 The category for a coding system which has the same code range
3899 as ISO2022 of 8-bit environment but uses locking shift or
3900 single shift functions. Assigned the coding-system (Lisp
3901 symbol) `iso-2022-8bit-ss2' by default.
3903 o coding-category-big5
3905 The category for a coding system which has the same code range
3906 as BIG5. Assigned the coding-system (Lisp symbol)
3907 `cn-big5' by default.
3909 o coding-category-utf-8
3911 The category for a coding system which has the same code range
3912 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp
3913 symbol) `utf-8' by default.
3915 o coding-category-utf-16-be
3917 The category for a coding system in which a text has an
3918 Unicode signature (cf. Unicode Standard) in the order of BIG
3919 endian at the head. Assigned the coding-system (Lisp symbol)
3920 `utf-16-be' by default.
3922 o coding-category-utf-16-le
3924 The category for a coding system in which a text has an
3925 Unicode signature (cf. Unicode Standard) in the order of
3926 LITTLE endian at the head. Assigned the coding-system (Lisp
3927 symbol) `utf-16-le' by default.
3929 o coding-category-ccl
3931 The category for a coding system of which encoder/decoder is
3932 written in CCL programs. The default value is nil, i.e., no
3933 coding system is assigned.
3935 o coding-category-binary
3937 The category for a coding system not categorized in any of the
3938 above. Assigned the coding-system (Lisp symbol)
3939 `no-conversion' by default.
3941 Each of them is a Lisp symbol and the value is an actual
3942 `coding-system' (this is also a Lisp symbol) assigned by a user.
3943 What Emacs does actually is to detect a category of coding system.
3944 Then, it uses a `coding-system' assigned to it. If Emacs can't
3945 decide a single possible category, it selects a category of the
3946 highest priority. Priorities of categories are also specified by a
3947 user in a Lisp variable `coding-category-list'.
3952 int ascii_skip_code
[256];
3954 /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
3955 If it detects possible coding systems, return an integer in which
3956 appropriate flag bits are set. Flag bits are defined by macros
3957 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
3958 it should point the table `coding_priorities'. In that case, only
3959 the flag bit for a coding system of the highest priority is set in
3960 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
3961 range 0x80..0x9F are in multibyte form.
3963 How many ASCII characters are at the head is returned as *SKIP. */
3966 detect_coding_mask (source
, src_bytes
, priorities
, skip
, multibytep
)
3967 unsigned char *source
;
3968 int src_bytes
, *priorities
, *skip
;
3971 register unsigned char c
;
3972 unsigned char *src
= source
, *src_end
= source
+ src_bytes
;
3973 unsigned int mask
, utf16_examined_p
, iso2022_examined_p
;
3976 /* At first, skip all ASCII characters and control characters except
3977 for three ISO2022 specific control characters. */
3978 ascii_skip_code
[ISO_CODE_SO
] = 0;
3979 ascii_skip_code
[ISO_CODE_SI
] = 0;
3980 ascii_skip_code
[ISO_CODE_ESC
] = 0;
3982 label_loop_detect_coding
:
3983 while (src
< src_end
&& ascii_skip_code
[*src
]) src
++;
3984 *skip
= src
- source
;
3987 /* We found nothing other than ASCII. There's nothing to do. */
3991 /* The text seems to be encoded in some multilingual coding system.
3992 Now, try to find in which coding system the text is encoded. */
3995 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
3996 /* C is an ISO2022 specific control code of C0. */
3997 mask
= detect_coding_iso2022 (src
, src_end
, multibytep
);
4000 /* No valid ISO2022 code follows C. Try again. */
4002 if (c
== ISO_CODE_ESC
)
4003 ascii_skip_code
[ISO_CODE_ESC
] = 1;
4005 ascii_skip_code
[ISO_CODE_SO
] = ascii_skip_code
[ISO_CODE_SI
] = 1;
4006 goto label_loop_detect_coding
;
4010 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4012 if (mask
& priorities
[i
])
4013 return priorities
[i
];
4015 return CODING_CATEGORY_MASK_RAW_TEXT
;
4022 if (multibytep
&& c
== LEADING_CODE_8_BIT_CONTROL
)
4027 /* C is the first byte of SJIS character code,
4028 or a leading-code of Emacs' internal format (emacs-mule),
4029 or the first byte of UTF-16. */
4030 try = (CODING_CATEGORY_MASK_SJIS
4031 | CODING_CATEGORY_MASK_EMACS_MULE
4032 | CODING_CATEGORY_MASK_UTF_16_BE
4033 | CODING_CATEGORY_MASK_UTF_16_LE
);
4035 /* Or, if C is a special latin extra code,
4036 or is an ISO2022 specific control code of C1 (SS2 or SS3),
4037 or is an ISO2022 control-sequence-introducer (CSI),
4038 we should also consider the possibility of ISO2022 codings. */
4039 if ((VECTORP (Vlatin_extra_code_table
)
4040 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
4041 || (c
== ISO_CODE_SS2
|| c
== ISO_CODE_SS3
)
4042 || (c
== ISO_CODE_CSI
4045 || ((*src
== '0' || *src
== '1' || *src
== '2')
4046 && src
+ 1 < src_end
4047 && src
[1] == ']')))))
4048 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
4049 | CODING_CATEGORY_MASK_ISO_8BIT
);
4052 /* C is a character of ISO2022 in graphic plane right,
4053 or a SJIS's 1-byte character code (i.e. JISX0201),
4054 or the first byte of BIG5's 2-byte code,
4055 or the first byte of UTF-8/16. */
4056 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
4057 | CODING_CATEGORY_MASK_ISO_8BIT
4058 | CODING_CATEGORY_MASK_SJIS
4059 | CODING_CATEGORY_MASK_BIG5
4060 | CODING_CATEGORY_MASK_UTF_8
4061 | CODING_CATEGORY_MASK_UTF_16_BE
4062 | CODING_CATEGORY_MASK_UTF_16_LE
);
4064 /* Or, we may have to consider the possibility of CCL. */
4065 if (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4066 && (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4067 ->spec
.ccl
.valid_codes
)[c
])
4068 try |= CODING_CATEGORY_MASK_CCL
;
4071 utf16_examined_p
= iso2022_examined_p
= 0;
4074 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4076 if (!iso2022_examined_p
4077 && (priorities
[i
] & try & CODING_CATEGORY_MASK_ISO
))
4079 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
);
4080 iso2022_examined_p
= 1;
4082 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_SJIS
)
4083 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4084 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_UTF_8
)
4085 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4086 else if (!utf16_examined_p
4087 && (priorities
[i
] & try &
4088 CODING_CATEGORY_MASK_UTF_16_BE_LE
))
4090 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4091 utf16_examined_p
= 1;
4093 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_BIG5
)
4094 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4095 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_EMACS_MULE
)
4096 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4097 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_CCL
)
4098 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4099 else if (priorities
[i
] & CODING_CATEGORY_MASK_RAW_TEXT
)
4100 mask
|= CODING_CATEGORY_MASK_RAW_TEXT
;
4101 else if (priorities
[i
] & CODING_CATEGORY_MASK_BINARY
)
4102 mask
|= CODING_CATEGORY_MASK_BINARY
;
4103 if (mask
& priorities
[i
])
4104 return priorities
[i
];
4106 return CODING_CATEGORY_MASK_RAW_TEXT
;
4108 if (try & CODING_CATEGORY_MASK_ISO
)
4109 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
);
4110 if (try & CODING_CATEGORY_MASK_SJIS
)
4111 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4112 if (try & CODING_CATEGORY_MASK_BIG5
)
4113 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4114 if (try & CODING_CATEGORY_MASK_UTF_8
)
4115 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4116 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE
)
4117 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4118 if (try & CODING_CATEGORY_MASK_EMACS_MULE
)
4119 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4120 if (try & CODING_CATEGORY_MASK_CCL
)
4121 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4123 return (mask
| CODING_CATEGORY_MASK_RAW_TEXT
| CODING_CATEGORY_MASK_BINARY
);
4126 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
4127 The information of the detected coding system is set in CODING. */
4130 detect_coding (coding
, src
, src_bytes
)
4131 struct coding_system
*coding
;
4132 const unsigned char *src
;
4139 val
= Vcoding_category_list
;
4140 mask
= detect_coding_mask (src
, src_bytes
, coding_priorities
, &skip
,
4141 coding
->src_multibyte
);
4142 coding
->heading_ascii
= skip
;
4146 /* We found a single coding system of the highest priority in MASK. */
4148 while (mask
&& ! (mask
& 1)) mask
>>= 1, idx
++;
4150 idx
= CODING_CATEGORY_IDX_RAW_TEXT
;
4152 val
= SYMBOL_VALUE (XVECTOR (Vcoding_category_table
)->contents
[idx
]);
4154 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4158 tmp
= Fget (val
, Qeol_type
);
4160 val
= XVECTOR (tmp
)->contents
[coding
->eol_type
];
4163 /* Setup this new coding system while preserving some slots. */
4165 int src_multibyte
= coding
->src_multibyte
;
4166 int dst_multibyte
= coding
->dst_multibyte
;
4168 setup_coding_system (val
, coding
);
4169 coding
->src_multibyte
= src_multibyte
;
4170 coding
->dst_multibyte
= dst_multibyte
;
4171 coding
->heading_ascii
= skip
;
4175 /* Detect how end-of-line of a text of length SRC_BYTES pointed by
4176 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
4177 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
4179 How many non-eol characters are at the head is returned as *SKIP. */
4181 #define MAX_EOL_CHECK_COUNT 3
4184 detect_eol_type (source
, src_bytes
, skip
)
4185 unsigned char *source
;
4186 int src_bytes
, *skip
;
4188 unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4190 int total
= 0; /* How many end-of-lines are found so far. */
4191 int eol_type
= CODING_EOL_UNDECIDED
;
4196 while (src
< src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4199 if (c
== '\n' || c
== '\r')
4202 *skip
= src
- 1 - source
;
4205 this_eol_type
= CODING_EOL_LF
;
4206 else if (src
>= src_end
|| *src
!= '\n')
4207 this_eol_type
= CODING_EOL_CR
;
4209 this_eol_type
= CODING_EOL_CRLF
, src
++;
4211 if (eol_type
== CODING_EOL_UNDECIDED
)
4212 /* This is the first end-of-line. */
4213 eol_type
= this_eol_type
;
4214 else if (eol_type
!= this_eol_type
)
4216 /* The found type is different from what found before. */
4217 eol_type
= CODING_EOL_INCONSISTENT
;
4224 *skip
= src_end
- source
;
4228 /* Like detect_eol_type, but detect EOL type in 2-octet
4229 big-endian/little-endian format for coding systems utf-16-be and
4233 detect_eol_type_in_2_octet_form (source
, src_bytes
, skip
, big_endian_p
)
4234 unsigned char *source
;
4235 int src_bytes
, *skip
, big_endian_p
;
4237 unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4238 unsigned int c1
, c2
;
4239 int total
= 0; /* How many end-of-lines are found so far. */
4240 int eol_type
= CODING_EOL_UNDECIDED
;
4251 while ((src
+ 1) < src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4253 c1
= (src
[msb
] << 8) | (src
[lsb
]);
4256 if (c1
== '\n' || c1
== '\r')
4259 *skip
= src
- 2 - source
;
4263 this_eol_type
= CODING_EOL_LF
;
4267 if ((src
+ 1) >= src_end
)
4269 this_eol_type
= CODING_EOL_CR
;
4273 c2
= (src
[msb
] << 8) | (src
[lsb
]);
4275 this_eol_type
= CODING_EOL_CRLF
, src
+= 2;
4277 this_eol_type
= CODING_EOL_CR
;
4281 if (eol_type
== CODING_EOL_UNDECIDED
)
4282 /* This is the first end-of-line. */
4283 eol_type
= this_eol_type
;
4284 else if (eol_type
!= this_eol_type
)
4286 /* The found type is different from what found before. */
4287 eol_type
= CODING_EOL_INCONSISTENT
;
4294 *skip
= src_end
- source
;
4298 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
4299 is encoded. If it detects an appropriate format of end-of-line, it
4300 sets the information in *CODING. */
4303 detect_eol (coding
, src
, src_bytes
)
4304 struct coding_system
*coding
;
4305 const unsigned char *src
;
4312 switch (coding
->category_idx
)
4314 case CODING_CATEGORY_IDX_UTF_16_BE
:
4315 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 1);
4317 case CODING_CATEGORY_IDX_UTF_16_LE
:
4318 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 0);
4321 eol_type
= detect_eol_type (src
, src_bytes
, &skip
);
4325 if (coding
->heading_ascii
> skip
)
4326 coding
->heading_ascii
= skip
;
4328 skip
= coding
->heading_ascii
;
4330 if (eol_type
== CODING_EOL_UNDECIDED
)
4332 if (eol_type
== CODING_EOL_INCONSISTENT
)
4335 /* This code is suppressed until we find a better way to
4336 distinguish raw text file and binary file. */
4338 /* If we have already detected that the coding is raw-text, the
4339 coding should actually be no-conversion. */
4340 if (coding
->type
== coding_type_raw_text
)
4342 setup_coding_system (Qno_conversion
, coding
);
4345 /* Else, let's decode only text code anyway. */
4347 eol_type
= CODING_EOL_LF
;
4350 val
= Fget (coding
->symbol
, Qeol_type
);
4351 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4353 int src_multibyte
= coding
->src_multibyte
;
4354 int dst_multibyte
= coding
->dst_multibyte
;
4355 struct composition_data
*cmp_data
= coding
->cmp_data
;
4357 setup_coding_system (XVECTOR (val
)->contents
[eol_type
], coding
);
4358 coding
->src_multibyte
= src_multibyte
;
4359 coding
->dst_multibyte
= dst_multibyte
;
4360 coding
->heading_ascii
= skip
;
4361 coding
->cmp_data
= cmp_data
;
4365 #define CONVERSION_BUFFER_EXTRA_ROOM 256
4367 #define DECODING_BUFFER_MAG(coding) \
4368 (coding->type == coding_type_iso2022 \
4370 : (coding->type == coding_type_ccl \
4371 ? coding->spec.ccl.decoder.buf_magnification \
4374 /* Return maximum size (bytes) of a buffer enough for decoding
4375 SRC_BYTES of text encoded in CODING. */
4378 decoding_buffer_size (coding
, src_bytes
)
4379 struct coding_system
*coding
;
4382 return (src_bytes
* DECODING_BUFFER_MAG (coding
)
4383 + CONVERSION_BUFFER_EXTRA_ROOM
);
4386 /* Return maximum size (bytes) of a buffer enough for encoding
4387 SRC_BYTES of text to CODING. */
4390 encoding_buffer_size (coding
, src_bytes
)
4391 struct coding_system
*coding
;
4396 if (coding
->type
== coding_type_ccl
)
4397 magnification
= coding
->spec
.ccl
.encoder
.buf_magnification
;
4398 else if (CODING_REQUIRE_ENCODING (coding
))
4403 return (src_bytes
* magnification
+ CONVERSION_BUFFER_EXTRA_ROOM
);
4406 /* Working buffer for code conversion. */
4407 struct conversion_buffer
4409 int size
; /* size of data. */
4410 int on_stack
; /* 1 if allocated by alloca. */
4411 unsigned char *data
;
4414 /* Don't use alloca for allocating memory space larger than this, lest
4415 we overflow their stack. */
4416 #define MAX_ALLOCA 16*1024
4418 /* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
4419 #define allocate_conversion_buffer(buf, len) \
4421 if (len < MAX_ALLOCA) \
4423 buf.data = (unsigned char *) alloca (len); \
4428 buf.data = (unsigned char *) xmalloc (len); \
4434 /* Double the allocated memory for *BUF. */
4436 extend_conversion_buffer (buf
)
4437 struct conversion_buffer
*buf
;
4441 unsigned char *save
= buf
->data
;
4442 buf
->data
= (unsigned char *) xmalloc (buf
->size
* 2);
4443 bcopy (save
, buf
->data
, buf
->size
);
4448 buf
->data
= (unsigned char *) xrealloc (buf
->data
, buf
->size
* 2);
4453 /* Free the allocated memory for BUF if it is not on stack. */
4455 free_conversion_buffer (buf
)
4456 struct conversion_buffer
*buf
;
4463 ccl_coding_driver (coding
, source
, destination
, src_bytes
, dst_bytes
, encodep
)
4464 struct coding_system
*coding
;
4465 unsigned char *source
, *destination
;
4466 int src_bytes
, dst_bytes
, encodep
;
4468 struct ccl_program
*ccl
4469 = encodep
? &coding
->spec
.ccl
.encoder
: &coding
->spec
.ccl
.decoder
;
4470 unsigned char *dst
= destination
;
4472 ccl
->suppress_error
= coding
->suppress_error
;
4473 ccl
->last_block
= coding
->mode
& CODING_MODE_LAST_BLOCK
;
4476 /* On encoding, EOL format is converted within ccl_driver. For
4477 that, setup proper information in the structure CCL. */
4478 ccl
->eol_type
= coding
->eol_type
;
4479 if (ccl
->eol_type
==CODING_EOL_UNDECIDED
)
4480 ccl
->eol_type
= CODING_EOL_LF
;
4481 ccl
->cr_consumed
= coding
->spec
.ccl
.cr_carryover
;
4483 ccl
->multibyte
= coding
->src_multibyte
;
4484 if (coding
->spec
.ccl
.eight_bit_carryover
[0] != 0)
4486 /* Move carryover bytes to DESTINATION. */
4487 unsigned char *p
= coding
->spec
.ccl
.eight_bit_carryover
;
4490 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4492 dst_bytes
-= dst
- destination
;
4495 coding
->produced
= (ccl_driver (ccl
, source
, dst
, src_bytes
, dst_bytes
,
4496 &(coding
->consumed
))
4497 + dst
- destination
);
4501 coding
->produced_char
= coding
->produced
;
4502 coding
->spec
.ccl
.cr_carryover
= ccl
->cr_consumed
;
4504 else if (!ccl
->eight_bit_control
)
4506 /* The produced bytes forms a valid multibyte sequence. */
4507 coding
->produced_char
4508 = multibyte_chars_in_text (destination
, coding
->produced
);
4509 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4513 /* On decoding, the destination should always multibyte. But,
4514 CCL program might have been generated an invalid multibyte
4515 sequence. Here we make such a sequence valid as
4518 = dst_bytes
? dst_bytes
: source
+ coding
->consumed
- destination
;
4520 if ((coding
->consumed
< src_bytes
4521 || !ccl
->last_block
)
4522 && coding
->produced
>= 1
4523 && destination
[coding
->produced
- 1] >= 0x80)
4525 /* We should not convert the tailing 8-bit codes to
4526 multibyte form even if they doesn't form a valid
4527 multibyte sequence. They may form a valid sequence in
4531 if (destination
[coding
->produced
- 1] < 0xA0)
4533 else if (coding
->produced
>= 2)
4535 if (destination
[coding
->produced
- 2] >= 0x80)
4537 if (destination
[coding
->produced
- 2] < 0xA0)
4539 else if (coding
->produced
>= 3
4540 && destination
[coding
->produced
- 3] >= 0x80
4541 && destination
[coding
->produced
- 3] < 0xA0)
4547 BCOPY_SHORT (destination
+ coding
->produced
- carryover
,
4548 coding
->spec
.ccl
.eight_bit_carryover
,
4550 coding
->spec
.ccl
.eight_bit_carryover
[carryover
] = 0;
4551 coding
->produced
-= carryover
;
4554 coding
->produced
= str_as_multibyte (destination
, bytes
,
4556 &(coding
->produced_char
));
4559 switch (ccl
->status
)
4561 case CCL_STAT_SUSPEND_BY_SRC
:
4562 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
4564 case CCL_STAT_SUSPEND_BY_DST
:
4565 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
4568 case CCL_STAT_INVALID_CMD
:
4569 coding
->result
= CODING_FINISH_INTERRUPT
;
4572 coding
->result
= CODING_FINISH_NORMAL
;
4575 return coding
->result
;
4578 /* Decode EOL format of the text at PTR of BYTES length destructively
4579 according to CODING->eol_type. This is called after the CCL
4580 program produced a decoded text at PTR. If we do CRLF->LF
4581 conversion, update CODING->produced and CODING->produced_char. */
4584 decode_eol_post_ccl (coding
, ptr
, bytes
)
4585 struct coding_system
*coding
;
4589 Lisp_Object val
, saved_coding_symbol
;
4590 unsigned char *pend
= ptr
+ bytes
;
4593 /* Remember the current coding system symbol. We set it back when
4594 an inconsistent EOL is found so that `last-coding-system-used' is
4595 set to the coding system that doesn't specify EOL conversion. */
4596 saved_coding_symbol
= coding
->symbol
;
4598 coding
->spec
.ccl
.cr_carryover
= 0;
4599 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
4601 /* Here, to avoid the call of setup_coding_system, we directly
4602 call detect_eol_type. */
4603 coding
->eol_type
= detect_eol_type (ptr
, bytes
, &dummy
);
4604 if (coding
->eol_type
== CODING_EOL_INCONSISTENT
)
4605 coding
->eol_type
= CODING_EOL_LF
;
4606 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4608 val
= Fget (coding
->symbol
, Qeol_type
);
4609 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4610 coding
->symbol
= XVECTOR (val
)->contents
[coding
->eol_type
];
4612 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4615 if (coding
->eol_type
== CODING_EOL_LF
4616 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
4618 /* We have nothing to do. */
4621 else if (coding
->eol_type
== CODING_EOL_CRLF
)
4623 unsigned char *pstart
= ptr
, *p
= ptr
;
4625 if (! (coding
->mode
& CODING_MODE_LAST_BLOCK
)
4626 && *(pend
- 1) == '\r')
4628 /* If the last character is CR, we can't handle it here
4629 because LF will be in the not-yet-decoded source text.
4630 Record that the CR is not yet processed. */
4631 coding
->spec
.ccl
.cr_carryover
= 1;
4633 coding
->produced_char
--;
4640 if (ptr
+ 1 < pend
&& *(ptr
+ 1) == '\n')
4647 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4648 goto undo_eol_conversion
;
4652 else if (*ptr
== '\n'
4653 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4654 goto undo_eol_conversion
;
4659 undo_eol_conversion
:
4660 /* We have faced with inconsistent EOL format at PTR.
4661 Convert all LFs before PTR back to CRLFs. */
4662 for (p
--, ptr
--; p
>= pstart
; p
--)
4665 *ptr
-- = '\n', *ptr
-- = '\r';
4669 /* If carryover is recorded, cancel it because we don't
4670 convert CRLF anymore. */
4671 if (coding
->spec
.ccl
.cr_carryover
)
4673 coding
->spec
.ccl
.cr_carryover
= 0;
4675 coding
->produced_char
++;
4679 coding
->eol_type
= CODING_EOL_LF
;
4680 coding
->symbol
= saved_coding_symbol
;
4684 /* As each two-byte sequence CRLF was converted to LF, (PEND
4685 - P) is the number of deleted characters. */
4686 coding
->produced
-= pend
- p
;
4687 coding
->produced_char
-= pend
- p
;
4690 else /* i.e. coding->eol_type == CODING_EOL_CR */
4692 unsigned char *p
= ptr
;
4694 for (; ptr
< pend
; ptr
++)
4698 else if (*ptr
== '\n'
4699 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4701 for (; p
< ptr
; p
++)
4707 coding
->eol_type
= CODING_EOL_LF
;
4708 coding
->symbol
= saved_coding_symbol
;
4714 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4715 decoding, it may detect coding system and format of end-of-line if
4716 those are not yet decided. The source should be unibyte, the
4717 result is multibyte if CODING->dst_multibyte is nonzero, else
4721 decode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4722 struct coding_system
*coding
;
4723 const unsigned char *source
;
4724 unsigned char *destination
;
4725 int src_bytes
, dst_bytes
;
4729 if (coding
->type
== coding_type_undecided
)
4730 detect_coding (coding
, source
, src_bytes
);
4732 if (coding
->eol_type
== CODING_EOL_UNDECIDED
4733 && coding
->type
!= coding_type_ccl
)
4735 detect_eol (coding
, source
, src_bytes
);
4736 /* We had better recover the original eol format if we
4737 encounter an inconsistent eol format while decoding. */
4738 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4741 coding
->produced
= coding
->produced_char
= 0;
4742 coding
->consumed
= coding
->consumed_char
= 0;
4744 coding
->result
= CODING_FINISH_NORMAL
;
4746 switch (coding
->type
)
4748 case coding_type_sjis
:
4749 decode_coding_sjis_big5 (coding
, source
, destination
,
4750 src_bytes
, dst_bytes
, 1);
4753 case coding_type_iso2022
:
4754 decode_coding_iso2022 (coding
, source
, destination
,
4755 src_bytes
, dst_bytes
);
4758 case coding_type_big5
:
4759 decode_coding_sjis_big5 (coding
, source
, destination
,
4760 src_bytes
, dst_bytes
, 0);
4763 case coding_type_emacs_mule
:
4764 decode_coding_emacs_mule (coding
, source
, destination
,
4765 src_bytes
, dst_bytes
);
4768 case coding_type_ccl
:
4769 if (coding
->spec
.ccl
.cr_carryover
)
4771 /* Put the CR which was not processed by the previous call
4772 of decode_eol_post_ccl in DESTINATION. It will be
4773 decoded together with the following LF by the call to
4774 decode_eol_post_ccl below. */
4775 *destination
= '\r';
4777 coding
->produced_char
++;
4779 extra
= coding
->spec
.ccl
.cr_carryover
;
4781 ccl_coding_driver (coding
, source
, destination
+ extra
,
4782 src_bytes
, dst_bytes
, 0);
4783 if (coding
->eol_type
!= CODING_EOL_LF
)
4785 coding
->produced
+= extra
;
4786 coding
->produced_char
+= extra
;
4787 decode_eol_post_ccl (coding
, destination
, coding
->produced
);
4792 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
4795 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
4796 && coding
->mode
& CODING_MODE_LAST_BLOCK
4797 && coding
->consumed
== src_bytes
)
4798 coding
->result
= CODING_FINISH_NORMAL
;
4800 if (coding
->mode
& CODING_MODE_LAST_BLOCK
4801 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
4803 const unsigned char *src
= source
+ coding
->consumed
;
4804 unsigned char *dst
= destination
+ coding
->produced
;
4806 src_bytes
-= coding
->consumed
;
4808 if (COMPOSING_P (coding
))
4809 DECODE_COMPOSITION_END ('1');
4813 dst
+= CHAR_STRING (c
, dst
);
4814 coding
->produced_char
++;
4816 coding
->consumed
= coding
->consumed_char
= src
- source
;
4817 coding
->produced
= dst
- destination
;
4818 coding
->result
= CODING_FINISH_NORMAL
;
4821 if (!coding
->dst_multibyte
)
4823 coding
->produced
= str_as_unibyte (destination
, coding
->produced
);
4824 coding
->produced_char
= coding
->produced
;
4827 return coding
->result
;
4830 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4831 multibyteness of the source is CODING->src_multibyte, the
4832 multibyteness of the result is always unibyte. */
4835 encode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4836 struct coding_system
*coding
;
4837 const unsigned char *source
;
4838 unsigned char *destination
;
4839 int src_bytes
, dst_bytes
;
4841 coding
->produced
= coding
->produced_char
= 0;
4842 coding
->consumed
= coding
->consumed_char
= 0;
4844 coding
->result
= CODING_FINISH_NORMAL
;
4846 switch (coding
->type
)
4848 case coding_type_sjis
:
4849 encode_coding_sjis_big5 (coding
, source
, destination
,
4850 src_bytes
, dst_bytes
, 1);
4853 case coding_type_iso2022
:
4854 encode_coding_iso2022 (coding
, source
, destination
,
4855 src_bytes
, dst_bytes
);
4858 case coding_type_big5
:
4859 encode_coding_sjis_big5 (coding
, source
, destination
,
4860 src_bytes
, dst_bytes
, 0);
4863 case coding_type_emacs_mule
:
4864 encode_coding_emacs_mule (coding
, source
, destination
,
4865 src_bytes
, dst_bytes
);
4868 case coding_type_ccl
:
4869 ccl_coding_driver (coding
, source
, destination
,
4870 src_bytes
, dst_bytes
, 1);
4874 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
4877 if (coding
->mode
& CODING_MODE_LAST_BLOCK
4878 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
4880 const unsigned char *src
= source
+ coding
->consumed
;
4881 unsigned char *dst
= destination
+ coding
->produced
;
4883 if (coding
->type
== coding_type_iso2022
)
4884 ENCODE_RESET_PLANE_AND_REGISTER
;
4885 if (COMPOSING_P (coding
))
4886 *dst
++ = ISO_CODE_ESC
, *dst
++ = '1';
4887 if (coding
->consumed
< src_bytes
)
4889 int len
= src_bytes
- coding
->consumed
;
4891 BCOPY_SHORT (src
, dst
, len
);
4892 if (coding
->src_multibyte
)
4893 len
= str_as_unibyte (dst
, len
);
4895 coding
->consumed
= src_bytes
;
4897 coding
->produced
= coding
->produced_char
= dst
- destination
;
4898 coding
->result
= CODING_FINISH_NORMAL
;
4901 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
4902 && coding
->consumed
== src_bytes
)
4903 coding
->result
= CODING_FINISH_NORMAL
;
4905 return coding
->result
;
4908 /* Scan text in the region between *BEG and *END (byte positions),
4909 skip characters which we don't have to decode by coding system
4910 CODING at the head and tail, then set *BEG and *END to the region
4911 of the text we actually have to convert. The caller should move
4912 the gap out of the region in advance if the region is from a
4915 If STR is not NULL, *BEG and *END are indices into STR. */
4918 shrink_decoding_region (beg
, end
, coding
, str
)
4920 struct coding_system
*coding
;
4923 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
, c
;
4925 Lisp_Object translation_table
;
4927 if (coding
->type
== coding_type_ccl
4928 || coding
->type
== coding_type_undecided
4929 || coding
->eol_type
!= CODING_EOL_LF
4930 || !NILP (coding
->post_read_conversion
)
4931 || coding
->composing
!= COMPOSITION_DISABLED
)
4933 /* We can't skip any data. */
4936 if (coding
->type
== coding_type_no_conversion
4937 || coding
->type
== coding_type_raw_text
4938 || coding
->type
== coding_type_emacs_mule
)
4940 /* We need no conversion, but don't have to skip any data here.
4941 Decoding routine handles them effectively anyway. */
4945 translation_table
= coding
->translation_table_for_decode
;
4946 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
4947 translation_table
= Vstandard_translation_table_for_decode
;
4948 if (CHAR_TABLE_P (translation_table
))
4951 for (i
= 0; i
< 128; i
++)
4952 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
4955 /* Some ASCII character should be translated. We give up
4960 if (coding
->heading_ascii
>= 0)
4961 /* Detection routine has already found how much we can skip at the
4963 *beg
+= coding
->heading_ascii
;
4967 begp_orig
= begp
= str
+ *beg
;
4968 endp_orig
= endp
= str
+ *end
;
4972 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
4973 endp_orig
= endp
= begp
+ *end
- *beg
;
4976 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
4977 || coding
->eol_type
== CODING_EOL_CRLF
);
4979 switch (coding
->type
)
4981 case coding_type_sjis
:
4982 case coding_type_big5
:
4983 /* We can skip all ASCII characters at the head. */
4984 if (coding
->heading_ascii
< 0)
4987 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\r') begp
++;
4989 while (begp
< endp
&& *begp
< 0x80) begp
++;
4991 /* We can skip all ASCII characters at the tail except for the
4992 second byte of SJIS or BIG5 code. */
4994 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\r') endp
--;
4996 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
4997 /* Do not consider LF as ascii if preceded by CR, since that
4998 confuses eol decoding. */
4999 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5001 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] >= 0x80)
5005 case coding_type_iso2022
:
5006 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5007 /* We can't skip any data. */
5009 if (coding
->heading_ascii
< 0)
5011 /* We can skip all ASCII characters at the head except for a
5012 few control codes. */
5013 while (begp
< endp
&& (c
= *begp
) < 0x80
5014 && c
!= ISO_CODE_CR
&& c
!= ISO_CODE_SO
5015 && c
!= ISO_CODE_SI
&& c
!= ISO_CODE_ESC
5016 && (!eol_conversion
|| c
!= ISO_CODE_LF
))
5019 switch (coding
->category_idx
)
5021 case CODING_CATEGORY_IDX_ISO_8_1
:
5022 case CODING_CATEGORY_IDX_ISO_8_2
:
5023 /* We can skip all ASCII characters at the tail. */
5025 while (begp
< endp
&& (c
= endp
[-1]) < 0x80 && c
!= '\r') endp
--;
5027 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5028 /* Do not consider LF as ascii if preceded by CR, since that
5029 confuses eol decoding. */
5030 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5034 case CODING_CATEGORY_IDX_ISO_7
:
5035 case CODING_CATEGORY_IDX_ISO_7_TIGHT
:
5037 /* We can skip all characters at the tail except for 8-bit
5038 codes and ESC and the following 2-byte at the tail. */
5039 unsigned char *eight_bit
= NULL
;
5043 && (c
= endp
[-1]) != ISO_CODE_ESC
&& c
!= '\r')
5045 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5050 && (c
= endp
[-1]) != ISO_CODE_ESC
)
5052 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5055 /* Do not consider LF as ascii if preceded by CR, since that
5056 confuses eol decoding. */
5057 if (begp
< endp
&& endp
< endp_orig
5058 && endp
[-1] == '\r' && endp
[0] == '\n')
5060 if (begp
< endp
&& endp
[-1] == ISO_CODE_ESC
)
5062 if (endp
+ 1 < endp_orig
&& end
[0] == '(' && end
[1] == 'B')
5063 /* This is an ASCII designation sequence. We can
5064 surely skip the tail. But, if we have
5065 encountered an 8-bit code, skip only the codes
5067 endp
= eight_bit
? eight_bit
: endp
+ 2;
5069 /* Hmmm, we can't skip the tail. */
5081 *beg
+= begp
- begp_orig
;
5082 *end
+= endp
- endp_orig
;
5086 /* Like shrink_decoding_region but for encoding. */
5089 shrink_encoding_region (beg
, end
, coding
, str
)
5091 struct coding_system
*coding
;
5094 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
;
5096 Lisp_Object translation_table
;
5098 if (coding
->type
== coding_type_ccl
5099 || coding
->eol_type
== CODING_EOL_CRLF
5100 || coding
->eol_type
== CODING_EOL_CR
5101 || (coding
->cmp_data
&& coding
->cmp_data
->used
> 0))
5103 /* We can't skip any data. */
5106 if (coding
->type
== coding_type_no_conversion
5107 || coding
->type
== coding_type_raw_text
5108 || coding
->type
== coding_type_emacs_mule
5109 || coding
->type
== coding_type_undecided
)
5111 /* We need no conversion, but don't have to skip any data here.
5112 Encoding routine handles them effectively anyway. */
5116 translation_table
= coding
->translation_table_for_encode
;
5117 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5118 translation_table
= Vstandard_translation_table_for_encode
;
5119 if (CHAR_TABLE_P (translation_table
))
5122 for (i
= 0; i
< 128; i
++)
5123 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5126 /* Some ASCII character should be translated. We give up
5133 begp_orig
= begp
= str
+ *beg
;
5134 endp_orig
= endp
= str
+ *end
;
5138 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5139 endp_orig
= endp
= begp
+ *end
- *beg
;
5142 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5143 || coding
->eol_type
== CODING_EOL_CRLF
);
5145 /* Here, we don't have to check coding->pre_write_conversion because
5146 the caller is expected to have handled it already. */
5147 switch (coding
->type
)
5149 case coding_type_iso2022
:
5150 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5151 /* We can't skip any data. */
5153 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
5155 unsigned char *bol
= begp
;
5156 while (begp
< endp
&& *begp
< 0x80)
5159 if (begp
[-1] == '\n')
5163 goto label_skip_tail
;
5167 case coding_type_sjis
:
5168 case coding_type_big5
:
5169 /* We can skip all ASCII characters at the head and tail. */
5171 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\n') begp
++;
5173 while (begp
< endp
&& *begp
< 0x80) begp
++;
5176 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\n') endp
--;
5178 while (begp
< endp
&& *(endp
- 1) < 0x80) endp
--;
5185 *beg
+= begp
- begp_orig
;
5186 *end
+= endp
- endp_orig
;
5190 /* As shrinking conversion region requires some overhead, we don't try
5191 shrinking if the length of conversion region is less than this
5193 static int shrink_conversion_region_threshhold
= 1024;
5195 #define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
5197 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
5199 if (encodep) shrink_encoding_region (beg, end, coding, str); \
5200 else shrink_decoding_region (beg, end, coding, str); \
5205 code_convert_region_unwind (dummy
)
5208 inhibit_pre_post_conversion
= 0;
5212 /* Store information about all compositions in the range FROM and TO
5213 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
5214 buffer or a string, defaults to the current buffer. */
5217 coding_save_composition (coding
, from
, to
, obj
)
5218 struct coding_system
*coding
;
5225 if (coding
->composing
== COMPOSITION_DISABLED
)
5227 if (!coding
->cmp_data
)
5228 coding_allocate_composition_data (coding
, from
);
5229 if (!find_composition (from
, to
, &start
, &end
, &prop
, obj
)
5233 && (!find_composition (end
, to
, &start
, &end
, &prop
, obj
)
5236 coding
->composing
= COMPOSITION_NO
;
5239 if (COMPOSITION_VALID_P (start
, end
, prop
))
5241 enum composition_method method
= COMPOSITION_METHOD (prop
);
5242 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
5243 >= COMPOSITION_DATA_SIZE
)
5244 coding_allocate_composition_data (coding
, from
);
5245 /* For relative composition, we remember start and end
5246 positions, for the other compositions, we also remember
5248 CODING_ADD_COMPOSITION_START (coding
, start
- from
, method
);
5249 if (method
!= COMPOSITION_RELATIVE
)
5251 /* We must store a*/
5252 Lisp_Object val
, ch
;
5254 val
= COMPOSITION_COMPONENTS (prop
);
5258 ch
= XCAR (val
), val
= XCDR (val
);
5259 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5261 else if (VECTORP (val
) || STRINGP (val
))
5263 int len
= (VECTORP (val
)
5264 ? XVECTOR (val
)->size
: SCHARS (val
));
5266 for (i
= 0; i
< len
; i
++)
5269 ? Faref (val
, make_number (i
))
5270 : XVECTOR (val
)->contents
[i
]);
5271 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5274 else /* INTEGERP (val) */
5275 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (val
));
5277 CODING_ADD_COMPOSITION_END (coding
, end
- from
);
5282 && find_composition (start
, to
, &start
, &end
, &prop
, obj
)
5285 /* Make coding->cmp_data point to the first memory block. */
5286 while (coding
->cmp_data
->prev
)
5287 coding
->cmp_data
= coding
->cmp_data
->prev
;
5288 coding
->cmp_data_start
= 0;
5291 /* Reflect the saved information about compositions to OBJ.
5292 CODING->cmp_data points to a memory block for the information. OBJ
5293 is a buffer or a string, defaults to the current buffer. */
5296 coding_restore_composition (coding
, obj
)
5297 struct coding_system
*coding
;
5300 struct composition_data
*cmp_data
= coding
->cmp_data
;
5305 while (cmp_data
->prev
)
5306 cmp_data
= cmp_data
->prev
;
5312 for (i
= 0; i
< cmp_data
->used
&& cmp_data
->data
[i
] > 0;
5313 i
+= cmp_data
->data
[i
])
5315 int *data
= cmp_data
->data
+ i
;
5316 enum composition_method method
= (enum composition_method
) data
[3];
5317 Lisp_Object components
;
5319 if (method
== COMPOSITION_RELATIVE
)
5323 int len
= data
[0] - 4, j
;
5324 Lisp_Object args
[MAX_COMPOSITION_COMPONENTS
* 2 - 1];
5326 if (method
== COMPOSITION_WITH_RULE_ALTCHARS
5329 for (j
= 0; j
< len
; j
++)
5330 args
[j
] = make_number (data
[4 + j
]);
5331 components
= (method
== COMPOSITION_WITH_ALTCHARS
5332 ? Fstring (len
, args
) : Fvector (len
, args
));
5334 compose_text (data
[1], data
[2], components
, Qnil
, obj
);
5336 cmp_data
= cmp_data
->next
;
5340 /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
5341 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
5342 coding system CODING, and return the status code of code conversion
5343 (currently, this value has no meaning).
5345 How many characters (and bytes) are converted to how many
5346 characters (and bytes) are recorded in members of the structure
5349 If REPLACE is nonzero, we do various things as if the original text
5350 is deleted and a new text is inserted. See the comments in
5351 replace_range (insdel.c) to know what we are doing.
5353 If REPLACE is zero, it is assumed that the source text is unibyte.
5354 Otherwise, it is assumed that the source text is multibyte. */
5357 code_convert_region (from
, from_byte
, to
, to_byte
, coding
, encodep
, replace
)
5358 int from
, from_byte
, to
, to_byte
, encodep
, replace
;
5359 struct coding_system
*coding
;
5361 int len
= to
- from
, len_byte
= to_byte
- from_byte
;
5362 int nchars_del
= 0, nbytes_del
= 0;
5363 int require
, inserted
, inserted_byte
;
5364 int head_skip
, tail_skip
, total_skip
= 0;
5365 Lisp_Object saved_coding_symbol
;
5367 unsigned char *src
, *dst
;
5368 Lisp_Object deletion
;
5369 int orig_point
= PT
, orig_len
= len
;
5371 int multibyte_p
= !NILP (current_buffer
->enable_multibyte_characters
);
5374 saved_coding_symbol
= coding
->symbol
;
5376 if (from
< PT
&& PT
< to
)
5378 TEMP_SET_PT_BOTH (from
, from_byte
);
5384 int saved_from
= from
;
5385 int saved_inhibit_modification_hooks
;
5387 prepare_to_modify_buffer (from
, to
, &from
);
5388 if (saved_from
!= from
)
5391 from_byte
= CHAR_TO_BYTE (from
), to_byte
= CHAR_TO_BYTE (to
);
5392 len_byte
= to_byte
- from_byte
;
5395 /* The code conversion routine can not preserve text properties
5396 for now. So, we must remove all text properties in the
5397 region. Here, we must suppress all modification hooks. */
5398 saved_inhibit_modification_hooks
= inhibit_modification_hooks
;
5399 inhibit_modification_hooks
= 1;
5400 Fset_text_properties (make_number (from
), make_number (to
), Qnil
, Qnil
);
5401 inhibit_modification_hooks
= saved_inhibit_modification_hooks
;
5404 if (! encodep
&& CODING_REQUIRE_DETECTION (coding
))
5406 /* We must detect encoding of text and eol format. */
5408 if (from
< GPT
&& to
> GPT
)
5409 move_gap_both (from
, from_byte
);
5410 if (coding
->type
== coding_type_undecided
)
5412 detect_coding (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5413 if (coding
->type
== coding_type_undecided
)
5415 /* It seems that the text contains only ASCII, but we
5416 should not leave it undecided because the deeper
5417 decoding routine (decode_coding) tries to detect the
5418 encodings again in vain. */
5419 coding
->type
= coding_type_emacs_mule
;
5420 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
5421 /* As emacs-mule decoder will handle composition, we
5422 need this setting to allocate coding->cmp_data
5424 coding
->composing
= COMPOSITION_NO
;
5427 if (coding
->eol_type
== CODING_EOL_UNDECIDED
5428 && coding
->type
!= coding_type_ccl
)
5430 detect_eol (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5431 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
5432 coding
->eol_type
= CODING_EOL_LF
;
5433 /* We had better recover the original eol format if we
5434 encounter an inconsistent eol format while decoding. */
5435 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
5439 /* Now we convert the text. */
5441 /* For encoding, we must process pre-write-conversion in advance. */
5442 if (! inhibit_pre_post_conversion
5444 && SYMBOLP (coding
->pre_write_conversion
)
5445 && ! NILP (Ffboundp (coding
->pre_write_conversion
)))
5447 /* The function in pre-write-conversion may put a new text in a
5449 struct buffer
*prev
= current_buffer
;
5452 record_unwind_protect (code_convert_region_unwind
, Qnil
);
5453 /* We should not call any more pre-write/post-read-conversion
5454 functions while this pre-write-conversion is running. */
5455 inhibit_pre_post_conversion
= 1;
5456 call2 (coding
->pre_write_conversion
,
5457 make_number (from
), make_number (to
));
5458 inhibit_pre_post_conversion
= 0;
5459 /* Discard the unwind protect. */
5462 if (current_buffer
!= prev
)
5465 new = Fcurrent_buffer ();
5466 set_buffer_internal_1 (prev
);
5467 del_range_2 (from
, from_byte
, to
, to_byte
, 0);
5468 TEMP_SET_PT_BOTH (from
, from_byte
);
5469 insert_from_buffer (XBUFFER (new), 1, len
, 0);
5471 if (orig_point
>= to
)
5472 orig_point
+= len
- orig_len
;
5473 else if (orig_point
> from
)
5477 from_byte
= CHAR_TO_BYTE (from
);
5478 to_byte
= CHAR_TO_BYTE (to
);
5479 len_byte
= to_byte
- from_byte
;
5480 TEMP_SET_PT_BOTH (from
, from_byte
);
5486 if (! EQ (current_buffer
->undo_list
, Qt
))
5487 deletion
= make_buffer_string_both (from
, from_byte
, to
, to_byte
, 1);
5490 nchars_del
= to
- from
;
5491 nbytes_del
= to_byte
- from_byte
;
5495 if (coding
->composing
!= COMPOSITION_DISABLED
)
5498 coding_save_composition (coding
, from
, to
, Fcurrent_buffer ());
5500 coding_allocate_composition_data (coding
, from
);
5503 /* Try to skip the heading and tailing ASCIIs. */
5504 if (coding
->type
!= coding_type_ccl
)
5506 int from_byte_orig
= from_byte
, to_byte_orig
= to_byte
;
5508 if (from
< GPT
&& GPT
< to
)
5509 move_gap_both (from
, from_byte
);
5510 SHRINK_CONVERSION_REGION (&from_byte
, &to_byte
, coding
, NULL
, encodep
);
5511 if (from_byte
== to_byte
5512 && (encodep
|| NILP (coding
->post_read_conversion
))
5513 && ! CODING_REQUIRE_FLUSHING (coding
))
5515 coding
->produced
= len_byte
;
5516 coding
->produced_char
= len
;
5518 /* We must record and adjust for this new text now. */
5519 adjust_after_insert (from
, from_byte_orig
, to
, to_byte_orig
, len
);
5523 head_skip
= from_byte
- from_byte_orig
;
5524 tail_skip
= to_byte_orig
- to_byte
;
5525 total_skip
= head_skip
+ tail_skip
;
5528 len
-= total_skip
; len_byte
-= total_skip
;
5531 /* For conversion, we must put the gap before the text in addition to
5532 making the gap larger for efficient decoding. The required gap
5533 size starts from 2000 which is the magic number used in make_gap.
5534 But, after one batch of conversion, it will be incremented if we
5535 find that it is not enough . */
5538 if (GAP_SIZE
< require
)
5539 make_gap (require
- GAP_SIZE
);
5540 move_gap_both (from
, from_byte
);
5542 inserted
= inserted_byte
= 0;
5544 GAP_SIZE
+= len_byte
;
5547 ZV_BYTE
-= len_byte
;
5550 if (GPT
- BEG
< BEG_UNCHANGED
)
5551 BEG_UNCHANGED
= GPT
- BEG
;
5552 if (Z
- GPT
< END_UNCHANGED
)
5553 END_UNCHANGED
= Z
- GPT
;
5555 if (!encodep
&& coding
->src_multibyte
)
5557 /* Decoding routines expects that the source text is unibyte.
5558 We must convert 8-bit characters of multibyte form to
5560 int len_byte_orig
= len_byte
;
5561 len_byte
= str_as_unibyte (GAP_END_ADDR
- len_byte
, len_byte
);
5562 if (len_byte
< len_byte_orig
)
5563 safe_bcopy (GAP_END_ADDR
- len_byte_orig
, GAP_END_ADDR
- len_byte
,
5565 coding
->src_multibyte
= 0;
5572 /* The buffer memory is now:
5573 +--------+converted-text+---------+-------original-text-------+---+
5574 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5575 |<---------------------- GAP ----------------------->| */
5576 src
= GAP_END_ADDR
- len_byte
;
5577 dst
= GPT_ADDR
+ inserted_byte
;
5580 result
= encode_coding (coding
, src
, dst
, len_byte
, 0);
5583 if (coding
->composing
!= COMPOSITION_DISABLED
)
5584 coding
->cmp_data
->char_offset
= from
+ inserted
;
5585 result
= decode_coding (coding
, src
, dst
, len_byte
, 0);
5588 /* The buffer memory is now:
5589 +--------+-------converted-text----+--+------original-text----+---+
5590 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5591 |<---------------------- GAP ----------------------->| */
5593 inserted
+= coding
->produced_char
;
5594 inserted_byte
+= coding
->produced
;
5595 len_byte
-= coding
->consumed
;
5597 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
5599 coding_allocate_composition_data (coding
, from
+ inserted
);
5603 src
+= coding
->consumed
;
5604 dst
+= coding
->produced
;
5606 if (result
== CODING_FINISH_NORMAL
)
5611 if (! encodep
&& result
== CODING_FINISH_INCONSISTENT_EOL
)
5613 unsigned char *pend
= dst
, *p
= pend
- inserted_byte
;
5614 Lisp_Object eol_type
;
5616 /* Encode LFs back to the original eol format (CR or CRLF). */
5617 if (coding
->eol_type
== CODING_EOL_CR
)
5619 while (p
< pend
) if (*p
++ == '\n') p
[-1] = '\r';
5625 while (p
< pend
) if (*p
++ == '\n') count
++;
5626 if (src
- dst
< count
)
5628 /* We don't have sufficient room for encoding LFs
5629 back to CRLF. We must record converted and
5630 not-yet-converted text back to the buffer
5631 content, enlarge the gap, then record them out of
5632 the buffer contents again. */
5633 int add
= len_byte
+ inserted_byte
;
5636 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5637 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5638 make_gap (count
- GAP_SIZE
);
5640 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5641 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5642 /* Don't forget to update SRC, DST, and PEND. */
5643 src
= GAP_END_ADDR
- len_byte
;
5644 dst
= GPT_ADDR
+ inserted_byte
;
5648 inserted_byte
+= count
;
5649 coding
->produced
+= count
;
5650 p
= dst
= pend
+ count
;
5654 if (*p
== '\n') count
--, *--p
= '\r';
5658 /* Suppress eol-format conversion in the further conversion. */
5659 coding
->eol_type
= CODING_EOL_LF
;
5661 /* Set the coding system symbol to that for Unix-like EOL. */
5662 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
5663 if (VECTORP (eol_type
)
5664 && XVECTOR (eol_type
)->size
== 3
5665 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
5666 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
5668 coding
->symbol
= saved_coding_symbol
;
5674 if (coding
->type
!= coding_type_ccl
5675 || coding
->mode
& CODING_MODE_LAST_BLOCK
)
5677 coding
->mode
|= CODING_MODE_LAST_BLOCK
;
5680 if (result
== CODING_FINISH_INSUFFICIENT_SRC
)
5682 /* The source text ends in invalid codes. Let's just
5683 make them valid buffer contents, and finish conversion. */
5686 unsigned char *start
= dst
;
5688 inserted
+= len_byte
;
5692 dst
+= CHAR_STRING (c
, dst
);
5695 inserted_byte
+= dst
- start
;
5699 inserted
+= len_byte
;
5700 inserted_byte
+= len_byte
;
5706 if (result
== CODING_FINISH_INTERRUPT
)
5708 /* The conversion procedure was interrupted by a user. */
5711 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5712 if (coding
->consumed
< 1)
5714 /* It's quite strange to require more memory without
5715 consuming any bytes. Perhaps CCL program bug. */
5720 /* We have just done the first batch of conversion which was
5721 stopped because of insufficient gap. Let's reconsider the
5722 required gap size (i.e. SRT - DST) now.
5724 We have converted ORIG bytes (== coding->consumed) into
5725 NEW bytes (coding->produced). To convert the remaining
5726 LEN bytes, we may need REQUIRE bytes of gap, where:
5727 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5728 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5729 Here, we are sure that NEW >= ORIG. */
5732 if (coding
->produced
<= coding
->consumed
)
5734 /* This happens because of CCL-based coding system with
5740 ratio
= (coding
->produced
- coding
->consumed
) / coding
->consumed
;
5741 require
= len_byte
* ratio
;
5745 if ((src
- dst
) < (require
+ 2000))
5747 /* See the comment above the previous call of make_gap. */
5748 int add
= len_byte
+ inserted_byte
;
5751 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5752 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5753 make_gap (require
+ 2000);
5755 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5756 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5759 if (src
- dst
> 0) *dst
= 0; /* Put an anchor. */
5761 if (encodep
&& coding
->dst_multibyte
)
5763 /* The output is unibyte. We must convert 8-bit characters to
5765 if (inserted_byte
* 2 > GAP_SIZE
)
5767 GAP_SIZE
-= inserted_byte
;
5768 ZV
+= inserted_byte
; Z
+= inserted_byte
;
5769 ZV_BYTE
+= inserted_byte
; Z_BYTE
+= inserted_byte
;
5770 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5771 make_gap (inserted_byte
- GAP_SIZE
);
5772 GAP_SIZE
+= inserted_byte
;
5773 ZV
-= inserted_byte
; Z
-= inserted_byte
;
5774 ZV_BYTE
-= inserted_byte
; Z_BYTE
-= inserted_byte
;
5775 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5777 inserted_byte
= str_to_multibyte (GPT_ADDR
, GAP_SIZE
, inserted_byte
);
5780 /* If we shrank the conversion area, adjust it now. */
5784 safe_bcopy (GAP_END_ADDR
, GPT_ADDR
+ inserted_byte
, tail_skip
);
5785 inserted
+= total_skip
; inserted_byte
+= total_skip
;
5786 GAP_SIZE
+= total_skip
;
5787 GPT
-= head_skip
; GPT_BYTE
-= head_skip
;
5788 ZV
-= total_skip
; ZV_BYTE
-= total_skip
;
5789 Z
-= total_skip
; Z_BYTE
-= total_skip
;
5790 from
-= head_skip
; from_byte
-= head_skip
;
5791 to
+= tail_skip
; to_byte
+= tail_skip
;
5795 if (! EQ (current_buffer
->undo_list
, Qt
))
5796 adjust_after_replace (from
, from_byte
, deletion
, inserted
, inserted_byte
);
5798 adjust_after_replace_noundo (from
, from_byte
, nchars_del
, nbytes_del
,
5799 inserted
, inserted_byte
);
5800 inserted
= Z
- prev_Z
;
5802 if (!encodep
&& coding
->cmp_data
&& coding
->cmp_data
->used
)
5803 coding_restore_composition (coding
, Fcurrent_buffer ());
5804 coding_free_composition_data (coding
);
5806 if (! inhibit_pre_post_conversion
5807 && ! encodep
&& ! NILP (coding
->post_read_conversion
))
5812 TEMP_SET_PT_BOTH (from
, from_byte
);
5814 record_unwind_protect (code_convert_region_unwind
, Qnil
);
5815 /* We should not call any more pre-write/post-read-conversion
5816 functions while this post-read-conversion is running. */
5817 inhibit_pre_post_conversion
= 1;
5818 val
= call1 (coding
->post_read_conversion
, make_number (inserted
));
5819 inhibit_pre_post_conversion
= 0;
5820 /* Discard the unwind protect. */
5823 inserted
+= Z
- prev_Z
;
5826 if (orig_point
>= from
)
5828 if (orig_point
>= from
+ orig_len
)
5829 orig_point
+= inserted
- orig_len
;
5832 TEMP_SET_PT (orig_point
);
5837 signal_after_change (from
, to
- from
, inserted
);
5838 update_compositions (from
, from
+ inserted
, CHECK_BORDER
);
5842 coding
->consumed
= to_byte
- from_byte
;
5843 coding
->consumed_char
= to
- from
;
5844 coding
->produced
= inserted_byte
;
5845 coding
->produced_char
= inserted
;
5852 run_pre_post_conversion_on_str (str
, coding
, encodep
)
5854 struct coding_system
*coding
;
5857 int count
= SPECPDL_INDEX ();
5858 struct gcpro gcpro1
, gcpro2
;
5859 int multibyte
= STRING_MULTIBYTE (str
);
5862 Lisp_Object old_deactivate_mark
;
5864 record_unwind_protect (Fset_buffer
, Fcurrent_buffer ());
5865 record_unwind_protect (code_convert_region_unwind
, Qnil
);
5866 /* It is not crucial to specbind this. */
5867 old_deactivate_mark
= Vdeactivate_mark
;
5868 GCPRO2 (str
, old_deactivate_mark
);
5870 buffer
= Fget_buffer_create (build_string (" *code-converting-work*"));
5871 buf
= XBUFFER (buffer
);
5873 buf
->directory
= current_buffer
->directory
;
5874 buf
->read_only
= Qnil
;
5875 buf
->filename
= Qnil
;
5876 buf
->undo_list
= Qt
;
5877 buf
->overlays_before
= Qnil
;
5878 buf
->overlays_after
= Qnil
;
5880 set_buffer_internal (buf
);
5881 /* We must insert the contents of STR as is without
5882 unibyte<->multibyte conversion. For that, we adjust the
5883 multibyteness of the working buffer to that of STR. */
5885 buf
->enable_multibyte_characters
= multibyte
? Qt
: Qnil
;
5887 insert_from_string (str
, 0, 0,
5888 SCHARS (str
), SBYTES (str
), 0);
5890 inhibit_pre_post_conversion
= 1;
5892 call2 (coding
->pre_write_conversion
, make_number (BEG
), make_number (Z
));
5895 TEMP_SET_PT_BOTH (BEG
, BEG_BYTE
);
5896 call1 (coding
->post_read_conversion
, make_number (Z
- BEG
));
5898 inhibit_pre_post_conversion
= 0;
5899 Vdeactivate_mark
= old_deactivate_mark
;
5900 str
= make_buffer_string (BEG
, Z
, 1);
5901 return unbind_to (count
, str
);
5905 decode_coding_string (str
, coding
, nocopy
)
5907 struct coding_system
*coding
;
5911 struct conversion_buffer buf
;
5913 Lisp_Object saved_coding_symbol
;
5915 int require_decoding
;
5916 int shrinked_bytes
= 0;
5918 int consumed
, consumed_char
, produced
, produced_char
;
5921 to_byte
= SBYTES (str
);
5923 saved_coding_symbol
= coding
->symbol
;
5924 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
5925 coding
->dst_multibyte
= 1;
5926 if (CODING_REQUIRE_DETECTION (coding
))
5928 /* See the comments in code_convert_region. */
5929 if (coding
->type
== coding_type_undecided
)
5931 detect_coding (coding
, SDATA (str
), to_byte
);
5932 if (coding
->type
== coding_type_undecided
)
5934 coding
->type
= coding_type_emacs_mule
;
5935 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
5936 /* As emacs-mule decoder will handle composition, we
5937 need this setting to allocate coding->cmp_data
5939 coding
->composing
= COMPOSITION_NO
;
5942 if (coding
->eol_type
== CODING_EOL_UNDECIDED
5943 && coding
->type
!= coding_type_ccl
)
5945 saved_coding_symbol
= coding
->symbol
;
5946 detect_eol (coding
, SDATA (str
), to_byte
);
5947 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
5948 coding
->eol_type
= CODING_EOL_LF
;
5949 /* We had better recover the original eol format if we
5950 encounter an inconsistent eol format while decoding. */
5951 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
5955 if (coding
->type
== coding_type_no_conversion
5956 || coding
->type
== coding_type_raw_text
)
5957 coding
->dst_multibyte
= 0;
5959 require_decoding
= CODING_REQUIRE_DECODING (coding
);
5961 if (STRING_MULTIBYTE (str
))
5963 /* Decoding routines expect the source text to be unibyte. */
5964 str
= Fstring_as_unibyte (str
);
5965 to_byte
= SBYTES (str
);
5967 coding
->src_multibyte
= 0;
5970 /* Try to skip the heading and tailing ASCIIs. */
5971 if (require_decoding
&& coding
->type
!= coding_type_ccl
)
5973 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
5975 if (from
== to_byte
)
5976 require_decoding
= 0;
5977 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
5980 if (!require_decoding
)
5982 coding
->consumed
= SBYTES (str
);
5983 coding
->consumed_char
= SCHARS (str
);
5984 if (coding
->dst_multibyte
)
5986 str
= Fstring_as_multibyte (str
);
5989 coding
->produced
= SBYTES (str
);
5990 coding
->produced_char
= SCHARS (str
);
5991 return (nocopy
? str
: Fcopy_sequence (str
));
5994 if (coding
->composing
!= COMPOSITION_DISABLED
)
5995 coding_allocate_composition_data (coding
, from
);
5996 len
= decoding_buffer_size (coding
, to_byte
- from
);
5997 allocate_conversion_buffer (buf
, len
);
5999 consumed
= consumed_char
= produced
= produced_char
= 0;
6002 result
= decode_coding (coding
, SDATA (str
) + from
+ consumed
,
6003 buf
.data
+ produced
, to_byte
- from
- consumed
,
6004 buf
.size
- produced
);
6005 consumed
+= coding
->consumed
;
6006 consumed_char
+= coding
->consumed_char
;
6007 produced
+= coding
->produced
;
6008 produced_char
+= coding
->produced_char
;
6009 if (result
== CODING_FINISH_NORMAL
6010 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6011 && coding
->consumed
== 0))
6013 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
6014 coding_allocate_composition_data (coding
, from
+ produced_char
);
6015 else if (result
== CODING_FINISH_INSUFFICIENT_DST
)
6016 extend_conversion_buffer (&buf
);
6017 else if (result
== CODING_FINISH_INCONSISTENT_EOL
)
6019 Lisp_Object eol_type
;
6021 /* Recover the original EOL format. */
6022 if (coding
->eol_type
== CODING_EOL_CR
)
6025 for (p
= buf
.data
; p
< buf
.data
+ produced
; p
++)
6026 if (*p
== '\n') *p
= '\r';
6028 else if (coding
->eol_type
== CODING_EOL_CRLF
)
6031 unsigned char *p0
, *p1
;
6032 for (p0
= buf
.data
, p1
= p0
+ produced
; p0
< p1
; p0
++)
6033 if (*p0
== '\n') num_eol
++;
6034 if (produced
+ num_eol
>= buf
.size
)
6035 extend_conversion_buffer (&buf
);
6036 for (p0
= buf
.data
+ produced
, p1
= p0
+ num_eol
; p0
> buf
.data
;)
6039 if (*p0
== '\n') *--p1
= '\r';
6041 produced
+= num_eol
;
6042 produced_char
+= num_eol
;
6044 /* Suppress eol-format conversion in the further conversion. */
6045 coding
->eol_type
= CODING_EOL_LF
;
6047 /* Set the coding system symbol to that for Unix-like EOL. */
6048 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
6049 if (VECTORP (eol_type
)
6050 && XVECTOR (eol_type
)->size
== 3
6051 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
6052 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
6054 coding
->symbol
= saved_coding_symbol
;
6060 coding
->consumed
= consumed
;
6061 coding
->consumed_char
= consumed_char
;
6062 coding
->produced
= produced
;
6063 coding
->produced_char
= produced_char
;
6065 if (coding
->dst_multibyte
)
6066 newstr
= make_uninit_multibyte_string (produced_char
+ shrinked_bytes
,
6067 produced
+ shrinked_bytes
);
6069 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6071 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6072 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6073 if (shrinked_bytes
> from
)
6074 STRING_COPYIN (newstr
, from
+ produced
,
6075 SDATA (str
) + to_byte
,
6076 shrinked_bytes
- from
);
6077 free_conversion_buffer (&buf
);
6079 if (coding
->cmp_data
&& coding
->cmp_data
->used
)
6080 coding_restore_composition (coding
, newstr
);
6081 coding_free_composition_data (coding
);
6083 if (SYMBOLP (coding
->post_read_conversion
)
6084 && !NILP (Ffboundp (coding
->post_read_conversion
)))
6085 newstr
= run_pre_post_conversion_on_str (newstr
, coding
, 0);
6091 encode_coding_string (str
, coding
, nocopy
)
6093 struct coding_system
*coding
;
6097 struct conversion_buffer buf
;
6098 int from
, to
, to_byte
;
6100 int shrinked_bytes
= 0;
6102 int consumed
, consumed_char
, produced
, produced_char
;
6104 if (SYMBOLP (coding
->pre_write_conversion
)
6105 && !NILP (Ffboundp (coding
->pre_write_conversion
)))
6106 str
= run_pre_post_conversion_on_str (str
, coding
, 1);
6110 to_byte
= SBYTES (str
);
6112 /* Encoding routines determine the multibyteness of the source text
6113 by coding->src_multibyte. */
6114 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
6115 coding
->dst_multibyte
= 0;
6116 if (! CODING_REQUIRE_ENCODING (coding
))
6118 coding
->consumed
= SBYTES (str
);
6119 coding
->consumed_char
= SCHARS (str
);
6120 if (STRING_MULTIBYTE (str
))
6122 str
= Fstring_as_unibyte (str
);
6125 coding
->produced
= SBYTES (str
);
6126 coding
->produced_char
= SCHARS (str
);
6127 return (nocopy
? str
: Fcopy_sequence (str
));
6130 if (coding
->composing
!= COMPOSITION_DISABLED
)
6131 coding_save_composition (coding
, from
, to
, str
);
6133 /* Try to skip the heading and tailing ASCIIs. */
6134 if (coding
->type
!= coding_type_ccl
)
6136 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6138 if (from
== to_byte
)
6139 return (nocopy
? str
: Fcopy_sequence (str
));
6140 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6143 len
= encoding_buffer_size (coding
, to_byte
- from
);
6144 allocate_conversion_buffer (buf
, len
);
6146 consumed
= consumed_char
= produced
= produced_char
= 0;
6149 result
= encode_coding (coding
, SDATA (str
) + from
+ consumed
,
6150 buf
.data
+ produced
, to_byte
- from
- consumed
,
6151 buf
.size
- produced
);
6152 consumed
+= coding
->consumed
;
6153 consumed_char
+= coding
->consumed_char
;
6154 produced
+= coding
->produced
;
6155 produced_char
+= coding
->produced_char
;
6156 if (result
== CODING_FINISH_NORMAL
6157 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6158 && coding
->consumed
== 0))
6160 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
6161 extend_conversion_buffer (&buf
);
6164 coding
->consumed
= consumed
;
6165 coding
->consumed_char
= consumed_char
;
6166 coding
->produced
= produced
;
6167 coding
->produced_char
= produced_char
;
6169 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6171 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6172 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6173 if (shrinked_bytes
> from
)
6174 STRING_COPYIN (newstr
, from
+ produced
,
6175 SDATA (str
) + to_byte
,
6176 shrinked_bytes
- from
);
6178 free_conversion_buffer (&buf
);
6179 coding_free_composition_data (coding
);
6186 /*** 8. Emacs Lisp library functions ***/
6188 DEFUN ("coding-system-p", Fcoding_system_p
, Scoding_system_p
, 1, 1, 0,
6189 doc
: /* Return t if OBJECT is nil or a coding-system.
6190 See the documentation of `make-coding-system' for information
6191 about coding-system objects. */)
6199 /* Get coding-spec vector for OBJ. */
6200 obj
= Fget (obj
, Qcoding_system
);
6201 return ((VECTORP (obj
) && XVECTOR (obj
)->size
== 5)
6205 DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system
,
6206 Sread_non_nil_coding_system
, 1, 1, 0,
6207 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT. */)
6214 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6215 Qt
, Qnil
, Qcoding_system_history
, Qnil
, Qnil
);
6217 while (SCHARS (val
) == 0);
6218 return (Fintern (val
, Qnil
));
6221 DEFUN ("read-coding-system", Fread_coding_system
, Sread_coding_system
, 1, 2, 0,
6222 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT.
6223 If the user enters null input, return second argument DEFAULT-CODING-SYSTEM. */)
6224 (prompt
, default_coding_system
)
6225 Lisp_Object prompt
, default_coding_system
;
6228 if (SYMBOLP (default_coding_system
))
6229 default_coding_system
= SYMBOL_NAME (default_coding_system
);
6230 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6231 Qt
, Qnil
, Qcoding_system_history
,
6232 default_coding_system
, Qnil
);
6233 return (SCHARS (val
) == 0 ? Qnil
: Fintern (val
, Qnil
));
6236 DEFUN ("check-coding-system", Fcheck_coding_system
, Scheck_coding_system
,
6238 doc
: /* Check validity of CODING-SYSTEM.
6239 If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.
6240 It is valid if it is a symbol with a non-nil `coding-system' property.
6241 The value of property should be a vector of length 5. */)
6243 Lisp_Object coding_system
;
6245 CHECK_SYMBOL (coding_system
);
6246 if (!NILP (Fcoding_system_p (coding_system
)))
6247 return coding_system
;
6249 Fsignal (Qcoding_system_error
, Fcons (coding_system
, Qnil
));
6253 detect_coding_system (src
, src_bytes
, highest
, multibytep
)
6254 const unsigned char *src
;
6255 int src_bytes
, highest
;
6258 int coding_mask
, eol_type
;
6259 Lisp_Object val
, tmp
;
6262 coding_mask
= detect_coding_mask (src
, src_bytes
, NULL
, &dummy
, multibytep
);
6263 eol_type
= detect_eol_type (src
, src_bytes
, &dummy
);
6264 if (eol_type
== CODING_EOL_INCONSISTENT
)
6265 eol_type
= CODING_EOL_UNDECIDED
;
6270 if (eol_type
!= CODING_EOL_UNDECIDED
)
6273 val2
= Fget (Qundecided
, Qeol_type
);
6275 val
= XVECTOR (val2
)->contents
[eol_type
];
6277 return (highest
? val
: Fcons (val
, Qnil
));
6280 /* At first, gather possible coding systems in VAL. */
6282 for (tmp
= Vcoding_category_list
; CONSP (tmp
); tmp
= XCDR (tmp
))
6284 Lisp_Object category_val
, category_index
;
6286 category_index
= Fget (XCAR (tmp
), Qcoding_category_index
);
6287 category_val
= Fsymbol_value (XCAR (tmp
));
6288 if (!NILP (category_val
)
6289 && NATNUMP (category_index
)
6290 && (coding_mask
& (1 << XFASTINT (category_index
))))
6292 val
= Fcons (category_val
, val
);
6298 val
= Fnreverse (val
);
6300 /* Then, replace the elements with subsidiary coding systems. */
6301 for (tmp
= val
; CONSP (tmp
); tmp
= XCDR (tmp
))
6303 if (eol_type
!= CODING_EOL_UNDECIDED
6304 && eol_type
!= CODING_EOL_INCONSISTENT
)
6307 eol
= Fget (XCAR (tmp
), Qeol_type
);
6309 XSETCAR (tmp
, XVECTOR (eol
)->contents
[eol_type
]);
6312 return (highest
? XCAR (val
) : val
);
6315 DEFUN ("detect-coding-region", Fdetect_coding_region
, Sdetect_coding_region
,
6317 doc
: /* Detect how the byte sequence in the region is encoded.
6318 Return a list of possible coding systems used on decoding a byte
6319 sequence containing the bytes in the region between START and END when
6320 the coding system `undecided' is specified. The list is ordered by
6321 priority decided in the current language environment.
6323 If only ASCII characters are found, it returns a list of single element
6324 `undecided' or its subsidiary coding system according to a detected
6327 If optional argument HIGHEST is non-nil, return the coding system of
6328 highest priority. */)
6329 (start
, end
, highest
)
6330 Lisp_Object start
, end
, highest
;
6333 int from_byte
, to_byte
;
6334 int include_anchor_byte
= 0;
6336 CHECK_NUMBER_COERCE_MARKER (start
);
6337 CHECK_NUMBER_COERCE_MARKER (end
);
6339 validate_region (&start
, &end
);
6340 from
= XINT (start
), to
= XINT (end
);
6341 from_byte
= CHAR_TO_BYTE (from
);
6342 to_byte
= CHAR_TO_BYTE (to
);
6344 if (from
< GPT
&& to
>= GPT
)
6345 move_gap_both (to
, to_byte
);
6346 /* If we an anchor byte `\0' follows the region, we include it in
6347 the detecting source. Then code detectors can handle the tailing
6348 byte sequence more accurately.
6350 Fix me: This is not a perfect solution. It is better that we
6351 add one more argument, say LAST_BLOCK, to all detect_coding_XXX.
6353 if (to
== Z
|| (to
== GPT
&& GAP_SIZE
> 0))
6354 include_anchor_byte
= 1;
6355 return detect_coding_system (BYTE_POS_ADDR (from_byte
),
6356 to_byte
- from_byte
+ include_anchor_byte
,
6358 !NILP (current_buffer
6359 ->enable_multibyte_characters
));
6362 DEFUN ("detect-coding-string", Fdetect_coding_string
, Sdetect_coding_string
,
6364 doc
: /* Detect how the byte sequence in STRING is encoded.
6365 Return a list of possible coding systems used on decoding a byte
6366 sequence containing the bytes in STRING when the coding system
6367 `undecided' is specified. The list is ordered by priority decided in
6368 the current language environment.
6370 If only ASCII characters are found, it returns a list of single element
6371 `undecided' or its subsidiary coding system according to a detected
6374 If optional argument HIGHEST is non-nil, return the coding system of
6375 highest priority. */)
6377 Lisp_Object string
, highest
;
6379 CHECK_STRING (string
);
6381 return detect_coding_system (SDATA (string
),
6382 /* "+ 1" is to include the anchor byte
6383 `\0'. With this, code detectors can
6384 handle the tailing bytes more
6386 SBYTES (string
) + 1,
6388 STRING_MULTIBYTE (string
));
6391 /* Return an intersection of lists L1 and L2. */
6394 intersection (l1
, l2
)
6397 Lisp_Object val
= Fcons (Qnil
, Qnil
), tail
;
6399 for (tail
= val
; CONSP (l1
); l1
= XCDR (l1
))
6401 if (!NILP (Fmemq (XCAR (l1
), l2
)))
6403 XSETCDR (tail
, Fcons (XCAR (l1
), Qnil
));
6411 /* Subroutine for Fsafe_coding_systems_region_internal.
6413 Return a list of coding systems that safely encode the multibyte
6414 text between P and PEND. SAFE_CODINGS, if non-nil, is a list of
6415 possible coding systems. If it is nil, it means that we have not
6416 yet found any coding systems.
6418 WORK_TABLE is a copy of the char-table Vchar_coding_system_table. An
6419 element of WORK_TABLE is set to t once the element is looked up.
6421 If a non-ASCII single byte char is found, set
6422 *single_byte_char_found to 1. */
6425 find_safe_codings (p
, pend
, safe_codings
, work_table
, single_byte_char_found
)
6426 unsigned char *p
, *pend
;
6427 Lisp_Object safe_codings
, work_table
;
6428 int *single_byte_char_found
;
6435 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6437 if (ASCII_BYTE_P (c
))
6438 /* We can ignore ASCII characters here. */
6440 if (SINGLE_BYTE_CHAR_P (c
))
6441 *single_byte_char_found
= 1;
6442 if (NILP (safe_codings
))
6444 /* Check the safe coding systems for C. */
6445 val
= char_table_ref_and_index (work_table
, c
, &idx
);
6447 /* This element was already checked. Ignore it. */
6449 /* Remember that we checked this element. */
6450 CHAR_TABLE_SET (work_table
, make_number (idx
), Qt
);
6452 /* If there are some safe coding systems for C and we have
6453 already found the other set of coding systems for the
6454 different characters, get the intersection of them. */
6455 if (!EQ (safe_codings
, Qt
) && !NILP (val
))
6456 val
= intersection (safe_codings
, val
);
6459 return safe_codings
;
6463 /* Return a list of coding systems that safely encode the text between
6464 START and END. If the text contains only ASCII or is unibyte,
6467 DEFUN ("find-coding-systems-region-internal",
6468 Ffind_coding_systems_region_internal
,
6469 Sfind_coding_systems_region_internal
, 2, 2, 0,
6470 doc
: /* Internal use only. */)
6472 Lisp_Object start
, end
;
6474 Lisp_Object work_table
, safe_codings
;
6475 int non_ascii_p
= 0;
6476 int single_byte_char_found
= 0;
6477 const unsigned char *p1
, *p1end
, *p2
, *p2end
, *p
;
6479 if (STRINGP (start
))
6481 if (!STRING_MULTIBYTE (start
))
6483 p1
= SDATA (start
), p1end
= p1
+ SBYTES (start
);
6485 if (SCHARS (start
) != SBYTES (start
))
6492 CHECK_NUMBER_COERCE_MARKER (start
);
6493 CHECK_NUMBER_COERCE_MARKER (end
);
6494 if (XINT (start
) < BEG
|| XINT (end
) > Z
|| XINT (start
) > XINT (end
))
6495 args_out_of_range (start
, end
);
6496 if (NILP (current_buffer
->enable_multibyte_characters
))
6498 from
= CHAR_TO_BYTE (XINT (start
));
6499 to
= CHAR_TO_BYTE (XINT (end
));
6500 stop
= from
< GPT_BYTE
&& GPT_BYTE
< to
? GPT_BYTE
: to
;
6501 p1
= BYTE_POS_ADDR (from
), p1end
= p1
+ (stop
- from
);
6505 p2
= BYTE_POS_ADDR (stop
), p2end
= p2
+ (to
- stop
);
6506 if (XINT (end
) - XINT (start
) != to
- from
)
6512 /* We are sure that the text contains no multibyte character.
6513 Check if it contains eight-bit-graphic. */
6515 for (p
= p1
; p
< p1end
&& ASCII_BYTE_P (*p
); p
++);
6518 for (p
= p2
; p
< p2end
&& ASCII_BYTE_P (*p
); p
++);
6524 /* The text contains non-ASCII characters. */
6525 work_table
= Fcopy_sequence (Vchar_coding_system_table
);
6526 safe_codings
= find_safe_codings (p1
, p1end
, Qt
, work_table
,
6527 &single_byte_char_found
);
6529 safe_codings
= find_safe_codings (p2
, p2end
, safe_codings
, work_table
,
6530 &single_byte_char_found
);
6532 if (EQ (safe_codings
, Qt
))
6533 ; /* Nothing to be done. */
6534 else if (!single_byte_char_found
)
6536 /* Append generic coding systems. */
6537 Lisp_Object args
[2];
6538 args
[0] = safe_codings
;
6539 args
[1] = Fchar_table_extra_slot (Vchar_coding_system_table
,
6541 safe_codings
= Fappend (2, args
);
6544 safe_codings
= Fcons (Qraw_text
,
6546 Fcons (Qno_conversion
, safe_codings
)));
6547 return safe_codings
;
6552 find_safe_codings_2 (p
, pend
, safe_codings
, work_table
, single_byte_char_found
)
6553 unsigned char *p
, *pend
;
6554 Lisp_Object safe_codings
, work_table
;
6555 int *single_byte_char_found
;
6558 Lisp_Object val
, ch
;
6559 Lisp_Object prev
, tail
;
6563 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6565 if (ASCII_BYTE_P (c
))
6566 /* We can ignore ASCII characters here. */
6568 if (SINGLE_BYTE_CHAR_P (c
))
6569 *single_byte_char_found
= 1;
6570 if (NILP (safe_codings
))
6571 /* Already all coding systems are excluded. */
6573 /* Check the safe coding systems for C. */
6574 ch
= make_number (c
);
6575 val
= Faref (work_table
, ch
);
6577 /* This element was already checked. Ignore it. */
6579 /* Remember that we checked this element. */
6580 Faset (work_table
, ch
, Qt
);
6582 for (prev
= tail
= safe_codings
; CONSP (tail
); tail
= XCDR (tail
))
6585 if (NILP (Faref (XCDR (val
), ch
)))
6587 /* Exclued this coding system from SAFE_CODINGS. */
6588 if (EQ (tail
, safe_codings
))
6589 safe_codings
= XCDR (safe_codings
);
6591 XSETCDR (prev
, XCDR (tail
));
6597 return safe_codings
;
6600 DEFUN ("find-coding-systems-region-internal-2",
6601 Ffind_coding_systems_region_internal_2
,
6602 Sfind_coding_systems_region_internal_2
, 2, 2, 0,
6603 doc
: /* Internal use only. */)
6605 Lisp_Object start
, end
;
6607 Lisp_Object work_table
, safe_codings
;
6608 int non_ascii_p
= 0;
6609 int single_byte_char_found
= 0;
6610 const unsigned char *p1
, *p1end
, *p2
, *p2end
, *p
;
6612 if (STRINGP (start
))
6614 if (!STRING_MULTIBYTE (start
))
6616 p1
= SDATA (start
), p1end
= p1
+ SBYTES (start
);
6618 if (SCHARS (start
) != SBYTES (start
))
6625 CHECK_NUMBER_COERCE_MARKER (start
);
6626 CHECK_NUMBER_COERCE_MARKER (end
);
6627 if (XINT (start
) < BEG
|| XINT (end
) > Z
|| XINT (start
) > XINT (end
))
6628 args_out_of_range (start
, end
);
6629 if (NILP (current_buffer
->enable_multibyte_characters
))
6631 from
= CHAR_TO_BYTE (XINT (start
));
6632 to
= CHAR_TO_BYTE (XINT (end
));
6633 stop
= from
< GPT_BYTE
&& GPT_BYTE
< to
? GPT_BYTE
: to
;
6634 p1
= BYTE_POS_ADDR (from
), p1end
= p1
+ (stop
- from
);
6638 p2
= BYTE_POS_ADDR (stop
), p2end
= p2
+ (to
- stop
);
6639 if (XINT (end
) - XINT (start
) != to
- from
)
6645 /* We are sure that the text contains no multibyte character.
6646 Check if it contains eight-bit-graphic. */
6648 for (p
= p1
; p
< p1end
&& ASCII_BYTE_P (*p
); p
++);
6651 for (p
= p2
; p
< p2end
&& ASCII_BYTE_P (*p
); p
++);
6657 /* The text contains non-ASCII characters. */
6659 work_table
= Fmake_char_table (Qchar_coding_system
, Qnil
);
6660 safe_codings
= Fcopy_sequence (XCDR (Vcoding_system_safe_chars
));
6662 safe_codings
= find_safe_codings_2 (p1
, p1end
, safe_codings
, work_table
,
6663 &single_byte_char_found
);
6665 safe_codings
= find_safe_codings_2 (p2
, p2end
, safe_codings
, work_table
,
6666 &single_byte_char_found
);
6667 if (EQ (safe_codings
, XCDR (Vcoding_system_safe_chars
)))
6671 /* Turn safe_codings to a list of coding systems... */
6674 if (single_byte_char_found
)
6675 /* ... and append these for eight-bit chars. */
6676 val
= Fcons (Qraw_text
,
6677 Fcons (Qemacs_mule
, Fcons (Qno_conversion
, Qnil
)));
6679 /* ... and append generic coding systems. */
6680 val
= Fcopy_sequence (XCAR (Vcoding_system_safe_chars
));
6682 for (; CONSP (safe_codings
); safe_codings
= XCDR (safe_codings
))
6683 val
= Fcons (XCAR (XCAR (safe_codings
)), val
);
6687 return safe_codings
;
6691 /* Search from position POS for such characters that are unencodable
6692 accoding to SAFE_CHARS, and return a list of their positions. P
6693 points where in the memory the character at POS exists. Limit the
6694 search at PEND or when Nth unencodable characters are found.
6696 If SAFE_CHARS is a char table, an element for an unencodable
6699 If SAFE_CHARS is nil, all non-ASCII characters are unencodable.
6701 Otherwise, SAFE_CHARS is t, and only eight-bit-contrl and
6702 eight-bit-graphic characters are unencodable. */
6705 unencodable_char_position (safe_chars
, pos
, p
, pend
, n
)
6706 Lisp_Object safe_chars
;
6708 unsigned char *p
, *pend
;
6711 Lisp_Object pos_list
;
6717 int c
= STRING_CHAR_AND_LENGTH (p
, MAX_MULTIBYTE_LENGTH
, len
);
6720 && (CHAR_TABLE_P (safe_chars
)
6721 ? NILP (CHAR_TABLE_REF (safe_chars
, c
))
6722 : (NILP (safe_chars
) || c
< 256)))
6724 pos_list
= Fcons (make_number (pos
), pos_list
);
6731 return Fnreverse (pos_list
);
6735 DEFUN ("unencodable-char-position", Funencodable_char_position
,
6736 Sunencodable_char_position
, 3, 5, 0,
6738 Return position of first un-encodable character in a region.
6739 START and END specfiy the region and CODING-SYSTEM specifies the
6740 encoding to check. Return nil if CODING-SYSTEM does encode the region.
6742 If optional 4th argument COUNT is non-nil, it specifies at most how
6743 many un-encodable characters to search. In this case, the value is a
6746 If optional 5th argument STRING is non-nil, it is a string to search
6747 for un-encodable characters. In that case, START and END are indexes
6749 (start
, end
, coding_system
, count
, string
)
6750 Lisp_Object start
, end
, coding_system
, count
, string
;
6753 Lisp_Object safe_chars
;
6754 struct coding_system coding
;
6755 Lisp_Object positions
;
6757 unsigned char *p
, *pend
;
6761 validate_region (&start
, &end
);
6762 from
= XINT (start
);
6764 if (NILP (current_buffer
->enable_multibyte_characters
))
6766 p
= CHAR_POS_ADDR (from
);
6770 pend
= CHAR_POS_ADDR (to
);
6774 CHECK_STRING (string
);
6775 CHECK_NATNUM (start
);
6777 from
= XINT (start
);
6780 || to
> SCHARS (string
))
6781 args_out_of_range_3 (string
, start
, end
);
6782 if (! STRING_MULTIBYTE (string
))
6784 p
= SDATA (string
) + string_char_to_byte (string
, from
);
6785 pend
= SDATA (string
) + string_char_to_byte (string
, to
);
6788 setup_coding_system (Fcheck_coding_system (coding_system
), &coding
);
6794 CHECK_NATNUM (count
);
6798 if (coding
.type
== coding_type_no_conversion
6799 || coding
.type
== coding_type_raw_text
)
6802 if (coding
.type
== coding_type_undecided
)
6805 safe_chars
= coding_safe_chars (coding_system
);
6807 if (STRINGP (string
)
6808 || from
>= GPT
|| to
<= GPT
)
6809 positions
= unencodable_char_position (safe_chars
, from
, p
, pend
, n
);
6812 Lisp_Object args
[2];
6814 args
[0] = unencodable_char_position (safe_chars
, from
, p
, GPT_ADDR
, n
);
6815 n
-= XINT (Flength (args
[0]));
6817 positions
= args
[0];
6820 args
[1] = unencodable_char_position (safe_chars
, GPT
, GAP_END_ADDR
,
6822 positions
= Fappend (2, args
);
6826 return (NILP (count
) ? Fcar (positions
) : positions
);
6831 code_convert_region1 (start
, end
, coding_system
, encodep
)
6832 Lisp_Object start
, end
, coding_system
;
6835 struct coding_system coding
;
6838 CHECK_NUMBER_COERCE_MARKER (start
);
6839 CHECK_NUMBER_COERCE_MARKER (end
);
6840 CHECK_SYMBOL (coding_system
);
6842 validate_region (&start
, &end
);
6843 from
= XFASTINT (start
);
6844 to
= XFASTINT (end
);
6846 if (NILP (coding_system
))
6847 return make_number (to
- from
);
6849 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
6850 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
6852 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
6853 coding
.src_multibyte
= coding
.dst_multibyte
6854 = !NILP (current_buffer
->enable_multibyte_characters
);
6855 code_convert_region (from
, CHAR_TO_BYTE (from
), to
, CHAR_TO_BYTE (to
),
6856 &coding
, encodep
, 1);
6857 Vlast_coding_system_used
= coding
.symbol
;
6858 return make_number (coding
.produced_char
);
6861 DEFUN ("decode-coding-region", Fdecode_coding_region
, Sdecode_coding_region
,
6862 3, 3, "r\nzCoding system: ",
6863 doc
: /* Decode the current region from the specified coding system.
6864 When called from a program, takes three arguments:
6865 START, END, and CODING-SYSTEM. START and END are buffer positions.
6866 This function sets `last-coding-system-used' to the precise coding system
6867 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6868 not fully specified.)
6869 It returns the length of the decoded text. */)
6870 (start
, end
, coding_system
)
6871 Lisp_Object start
, end
, coding_system
;
6873 return code_convert_region1 (start
, end
, coding_system
, 0);
6876 DEFUN ("encode-coding-region", Fencode_coding_region
, Sencode_coding_region
,
6877 3, 3, "r\nzCoding system: ",
6878 doc
: /* Encode the current region into the specified coding system.
6879 When called from a program, takes three arguments:
6880 START, END, and CODING-SYSTEM. START and END are buffer positions.
6881 This function sets `last-coding-system-used' to the precise coding system
6882 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6883 not fully specified.)
6884 It returns the length of the encoded text. */)
6885 (start
, end
, coding_system
)
6886 Lisp_Object start
, end
, coding_system
;
6888 return code_convert_region1 (start
, end
, coding_system
, 1);
6892 code_convert_string1 (string
, coding_system
, nocopy
, encodep
)
6893 Lisp_Object string
, coding_system
, nocopy
;
6896 struct coding_system coding
;
6898 CHECK_STRING (string
);
6899 CHECK_SYMBOL (coding_system
);
6901 if (NILP (coding_system
))
6902 return (NILP (nocopy
) ? Fcopy_sequence (string
) : string
);
6904 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
6905 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
6907 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
6909 ? encode_coding_string (string
, &coding
, !NILP (nocopy
))
6910 : decode_coding_string (string
, &coding
, !NILP (nocopy
)));
6911 Vlast_coding_system_used
= coding
.symbol
;
6916 DEFUN ("decode-coding-string", Fdecode_coding_string
, Sdecode_coding_string
,
6918 doc
: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result.
6919 Optional arg NOCOPY non-nil means it is OK to return STRING itself
6920 if the decoding operation is trivial.
6921 This function sets `last-coding-system-used' to the precise coding system
6922 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6923 not fully specified.) */)
6924 (string
, coding_system
, nocopy
)
6925 Lisp_Object string
, coding_system
, nocopy
;
6927 return code_convert_string1 (string
, coding_system
, nocopy
, 0);
6930 DEFUN ("encode-coding-string", Fencode_coding_string
, Sencode_coding_string
,
6932 doc
: /* Encode STRING to CODING-SYSTEM, and return the result.
6933 Optional arg NOCOPY non-nil means it is OK to return STRING itself
6934 if the encoding operation is trivial.
6935 This function sets `last-coding-system-used' to the precise coding system
6936 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6937 not fully specified.) */)
6938 (string
, coding_system
, nocopy
)
6939 Lisp_Object string
, coding_system
, nocopy
;
6941 return code_convert_string1 (string
, coding_system
, nocopy
, 1);
6944 /* Encode or decode STRING according to CODING_SYSTEM.
6945 Do not set Vlast_coding_system_used.
6947 This function is called only from macros DECODE_FILE and
6948 ENCODE_FILE, thus we ignore character composition. */
6951 code_convert_string_norecord (string
, coding_system
, encodep
)
6952 Lisp_Object string
, coding_system
;
6955 struct coding_system coding
;
6957 CHECK_STRING (string
);
6958 CHECK_SYMBOL (coding_system
);
6960 if (NILP (coding_system
))
6963 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
6964 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
6966 coding
.composing
= COMPOSITION_DISABLED
;
6967 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
6969 ? encode_coding_string (string
, &coding
, 1)
6970 : decode_coding_string (string
, &coding
, 1));
6973 DEFUN ("decode-sjis-char", Fdecode_sjis_char
, Sdecode_sjis_char
, 1, 1, 0,
6974 doc
: /* Decode a Japanese character which has CODE in shift_jis encoding.
6975 Return the corresponding character. */)
6979 unsigned char c1
, c2
, s1
, s2
;
6982 CHECK_NUMBER (code
);
6983 s1
= (XFASTINT (code
)) >> 8, s2
= (XFASTINT (code
)) & 0xFF;
6987 XSETFASTINT (val
, s2
);
6988 else if (s2
>= 0xA0 || s2
<= 0xDF)
6989 XSETFASTINT (val
, MAKE_CHAR (charset_katakana_jisx0201
, s2
, 0));
6991 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
6995 if ((s1
< 0x80 || (s1
> 0x9F && s1
< 0xE0) || s1
> 0xEF)
6996 || (s2
< 0x40 || s2
== 0x7F || s2
> 0xFC))
6997 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
6998 DECODE_SJIS (s1
, s2
, c1
, c2
);
6999 XSETFASTINT (val
, MAKE_CHAR (charset_jisx0208
, c1
, c2
));
7004 DEFUN ("encode-sjis-char", Fencode_sjis_char
, Sencode_sjis_char
, 1, 1, 0,
7005 doc
: /* Encode a Japanese character CHAR to shift_jis encoding.
7006 Return the corresponding code in SJIS. */)
7010 int charset
, c1
, c2
, s1
, s2
;
7014 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7015 if (charset
== CHARSET_ASCII
)
7019 else if (charset
== charset_jisx0208
7020 && c1
> 0x20 && c1
< 0x7F && c2
> 0x20 && c2
< 0x7F)
7022 ENCODE_SJIS (c1
, c2
, s1
, s2
);
7023 XSETFASTINT (val
, (s1
<< 8) | s2
);
7025 else if (charset
== charset_katakana_jisx0201
7026 && c1
> 0x20 && c2
< 0xE0)
7028 XSETFASTINT (val
, c1
| 0x80);
7031 error ("Can't encode to shift_jis: %d", XFASTINT (ch
));
7035 DEFUN ("decode-big5-char", Fdecode_big5_char
, Sdecode_big5_char
, 1, 1, 0,
7036 doc
: /* Decode a Big5 character which has CODE in BIG5 coding system.
7037 Return the corresponding character. */)
7042 unsigned char b1
, b2
, c1
, c2
;
7045 CHECK_NUMBER (code
);
7046 b1
= (XFASTINT (code
)) >> 8, b2
= (XFASTINT (code
)) & 0xFF;
7050 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7055 if ((b1
< 0xA1 || b1
> 0xFE)
7056 || (b2
< 0x40 || (b2
> 0x7E && b2
< 0xA1) || b2
> 0xFE))
7057 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7058 DECODE_BIG5 (b1
, b2
, charset
, c1
, c2
);
7059 XSETFASTINT (val
, MAKE_CHAR (charset
, c1
, c2
));
7064 DEFUN ("encode-big5-char", Fencode_big5_char
, Sencode_big5_char
, 1, 1, 0,
7065 doc
: /* Encode the Big5 character CHAR to BIG5 coding system.
7066 Return the corresponding character code in Big5. */)
7070 int charset
, c1
, c2
, b1
, b2
;
7074 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7075 if (charset
== CHARSET_ASCII
)
7079 else if ((charset
== charset_big5_1
7080 && (XFASTINT (ch
) >= 0x250a1 && XFASTINT (ch
) <= 0x271ec))
7081 || (charset
== charset_big5_2
7082 && XFASTINT (ch
) >= 0x290a1 && XFASTINT (ch
) <= 0x2bdb2))
7084 ENCODE_BIG5 (charset
, c1
, c2
, b1
, b2
);
7085 XSETFASTINT (val
, (b1
<< 8) | b2
);
7088 error ("Can't encode to Big5: %d", XFASTINT (ch
));
7092 DEFUN ("set-terminal-coding-system-internal", Fset_terminal_coding_system_internal
,
7093 Sset_terminal_coding_system_internal
, 1, 1, 0,
7094 doc
: /* Internal use only. */)
7096 Lisp_Object coding_system
;
7098 CHECK_SYMBOL (coding_system
);
7099 setup_coding_system (Fcheck_coding_system (coding_system
), &terminal_coding
);
7100 /* We had better not send unsafe characters to terminal. */
7101 terminal_coding
.flags
|= CODING_FLAG_ISO_SAFE
;
7102 /* Character composition should be disabled. */
7103 terminal_coding
.composing
= COMPOSITION_DISABLED
;
7104 /* Error notification should be suppressed. */
7105 terminal_coding
.suppress_error
= 1;
7106 terminal_coding
.src_multibyte
= 1;
7107 terminal_coding
.dst_multibyte
= 0;
7111 DEFUN ("set-safe-terminal-coding-system-internal", Fset_safe_terminal_coding_system_internal
,
7112 Sset_safe_terminal_coding_system_internal
, 1, 1, 0,
7113 doc
: /* Internal use only. */)
7115 Lisp_Object coding_system
;
7117 CHECK_SYMBOL (coding_system
);
7118 setup_coding_system (Fcheck_coding_system (coding_system
),
7119 &safe_terminal_coding
);
7120 /* Character composition should be disabled. */
7121 safe_terminal_coding
.composing
= COMPOSITION_DISABLED
;
7122 /* Error notification should be suppressed. */
7123 terminal_coding
.suppress_error
= 1;
7124 safe_terminal_coding
.src_multibyte
= 1;
7125 safe_terminal_coding
.dst_multibyte
= 0;
7129 DEFUN ("terminal-coding-system", Fterminal_coding_system
,
7130 Sterminal_coding_system
, 0, 0, 0,
7131 doc
: /* Return coding system specified for terminal output. */)
7134 return terminal_coding
.symbol
;
7137 DEFUN ("set-keyboard-coding-system-internal", Fset_keyboard_coding_system_internal
,
7138 Sset_keyboard_coding_system_internal
, 1, 1, 0,
7139 doc
: /* Internal use only. */)
7141 Lisp_Object coding_system
;
7143 CHECK_SYMBOL (coding_system
);
7144 setup_coding_system (Fcheck_coding_system (coding_system
), &keyboard_coding
);
7145 /* Character composition should be disabled. */
7146 keyboard_coding
.composing
= COMPOSITION_DISABLED
;
7150 DEFUN ("keyboard-coding-system", Fkeyboard_coding_system
,
7151 Skeyboard_coding_system
, 0, 0, 0,
7152 doc
: /* Return coding system specified for decoding keyboard input. */)
7155 return keyboard_coding
.symbol
;
7159 DEFUN ("find-operation-coding-system", Ffind_operation_coding_system
,
7160 Sfind_operation_coding_system
, 1, MANY
, 0,
7161 doc
: /* Choose a coding system for an operation based on the target name.
7162 The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).
7163 DECODING-SYSTEM is the coding system to use for decoding
7164 \(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system
7165 for encoding (in case OPERATION does encoding).
7167 The first argument OPERATION specifies an I/O primitive:
7168 For file I/O, `insert-file-contents' or `write-region'.
7169 For process I/O, `call-process', `call-process-region', or `start-process'.
7170 For network I/O, `open-network-stream'.
7172 The remaining arguments should be the same arguments that were passed
7173 to the primitive. Depending on which primitive, one of those arguments
7174 is selected as the TARGET. For example, if OPERATION does file I/O,
7175 whichever argument specifies the file name is TARGET.
7177 TARGET has a meaning which depends on OPERATION:
7178 For file I/O, TARGET is a file name.
7179 For process I/O, TARGET is a process name.
7180 For network I/O, TARGET is a service name or a port number
7182 This function looks up what specified for TARGET in,
7183 `file-coding-system-alist', `process-coding-system-alist',
7184 or `network-coding-system-alist' depending on OPERATION.
7185 They may specify a coding system, a cons of coding systems,
7186 or a function symbol to call.
7187 In the last case, we call the function with one argument,
7188 which is a list of all the arguments given to this function.
7190 usage: (find-operation-coding-system OPERATION ARGUMENTS ...) */)
7195 Lisp_Object operation
, target_idx
, target
, val
;
7196 register Lisp_Object chain
;
7199 error ("Too few arguments");
7200 operation
= args
[0];
7201 if (!SYMBOLP (operation
)
7202 || !INTEGERP (target_idx
= Fget (operation
, Qtarget_idx
)))
7203 error ("Invalid first argument");
7204 if (nargs
< 1 + XINT (target_idx
))
7205 error ("Too few arguments for operation: %s",
7206 SDATA (SYMBOL_NAME (operation
)));
7207 /* For write-region, if the 6th argument (i.e. VISIT, the 5th
7208 argument to write-region) is string, it must be treated as a
7209 target file name. */
7210 if (EQ (operation
, Qwrite_region
)
7212 && STRINGP (args
[5]))
7213 target_idx
= make_number (4);
7214 target
= args
[XINT (target_idx
) + 1];
7215 if (!(STRINGP (target
)
7216 || (EQ (operation
, Qopen_network_stream
) && INTEGERP (target
))))
7217 error ("Invalid argument %d", XINT (target_idx
) + 1);
7219 chain
= ((EQ (operation
, Qinsert_file_contents
)
7220 || EQ (operation
, Qwrite_region
))
7221 ? Vfile_coding_system_alist
7222 : (EQ (operation
, Qopen_network_stream
)
7223 ? Vnetwork_coding_system_alist
7224 : Vprocess_coding_system_alist
));
7228 for (; CONSP (chain
); chain
= XCDR (chain
))
7234 && ((STRINGP (target
)
7235 && STRINGP (XCAR (elt
))
7236 && fast_string_match (XCAR (elt
), target
) >= 0)
7237 || (INTEGERP (target
) && EQ (target
, XCAR (elt
)))))
7240 /* Here, if VAL is both a valid coding system and a valid
7241 function symbol, we return VAL as a coding system. */
7244 if (! SYMBOLP (val
))
7246 if (! NILP (Fcoding_system_p (val
)))
7247 return Fcons (val
, val
);
7248 if (! NILP (Ffboundp (val
)))
7250 val
= call1 (val
, Flist (nargs
, args
));
7253 if (SYMBOLP (val
) && ! NILP (Fcoding_system_p (val
)))
7254 return Fcons (val
, val
);
7262 DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal
,
7263 Supdate_coding_systems_internal
, 0, 0, 0,
7264 doc
: /* Update internal database for ISO2022 and CCL based coding systems.
7265 When values of any coding categories are changed, you must
7266 call this function. */)
7271 for (i
= CODING_CATEGORY_IDX_EMACS_MULE
; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7275 val
= SYMBOL_VALUE (XVECTOR (Vcoding_category_table
)->contents
[i
]);
7278 if (! coding_system_table
[i
])
7279 coding_system_table
[i
] = ((struct coding_system
*)
7280 xmalloc (sizeof (struct coding_system
)));
7281 setup_coding_system (val
, coding_system_table
[i
]);
7283 else if (coding_system_table
[i
])
7285 xfree (coding_system_table
[i
]);
7286 coding_system_table
[i
] = NULL
;
7293 DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal
,
7294 Sset_coding_priority_internal
, 0, 0, 0,
7295 doc
: /* Update internal database for the current value of `coding-category-list'.
7296 This function is internal use only. */)
7302 val
= Vcoding_category_list
;
7304 while (CONSP (val
) && i
< CODING_CATEGORY_IDX_MAX
)
7306 if (! SYMBOLP (XCAR (val
)))
7308 idx
= XFASTINT (Fget (XCAR (val
), Qcoding_category_index
));
7309 if (idx
>= CODING_CATEGORY_IDX_MAX
)
7311 coding_priorities
[i
++] = (1 << idx
);
7314 /* If coding-category-list is valid and contains all coding
7315 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
7316 the following code saves Emacs from crashing. */
7317 while (i
< CODING_CATEGORY_IDX_MAX
)
7318 coding_priorities
[i
++] = CODING_CATEGORY_MASK_RAW_TEXT
;
7323 DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal
,
7324 Sdefine_coding_system_internal
, 1, 1, 0,
7325 doc
: /* Register CODING-SYSTEM as a base coding system.
7326 This function is internal use only. */)
7328 Lisp_Object coding_system
;
7330 Lisp_Object safe_chars
, slot
;
7332 if (NILP (Fcheck_coding_system (coding_system
)))
7333 Fsignal (Qcoding_system_error
, Fcons (coding_system
, Qnil
));
7334 safe_chars
= coding_safe_chars (coding_system
);
7335 if (! EQ (safe_chars
, Qt
) && ! CHAR_TABLE_P (safe_chars
))
7336 error ("No valid safe-chars property for %s",
7337 SDATA (SYMBOL_NAME (coding_system
)));
7338 if (EQ (safe_chars
, Qt
))
7340 if (NILP (Fmemq (coding_system
, XCAR (Vcoding_system_safe_chars
))))
7341 XSETCAR (Vcoding_system_safe_chars
,
7342 Fcons (coding_system
, XCAR (Vcoding_system_safe_chars
)));
7346 slot
= Fassq (coding_system
, XCDR (Vcoding_system_safe_chars
));
7348 XSETCDR (Vcoding_system_safe_chars
,
7349 nconc2 (XCDR (Vcoding_system_safe_chars
),
7350 Fcons (Fcons (coding_system
, safe_chars
), Qnil
)));
7352 XSETCDR (slot
, safe_chars
);
7360 /*** 9. Post-amble ***/
7367 /* Emacs' internal format specific initialize routine. */
7368 for (i
= 0; i
<= 0x20; i
++)
7369 emacs_code_class
[i
] = EMACS_control_code
;
7370 emacs_code_class
[0x0A] = EMACS_linefeed_code
;
7371 emacs_code_class
[0x0D] = EMACS_carriage_return_code
;
7372 for (i
= 0x21 ; i
< 0x7F; i
++)
7373 emacs_code_class
[i
] = EMACS_ascii_code
;
7374 emacs_code_class
[0x7F] = EMACS_control_code
;
7375 for (i
= 0x80; i
< 0xFF; i
++)
7376 emacs_code_class
[i
] = EMACS_invalid_code
;
7377 emacs_code_class
[LEADING_CODE_PRIVATE_11
] = EMACS_leading_code_3
;
7378 emacs_code_class
[LEADING_CODE_PRIVATE_12
] = EMACS_leading_code_3
;
7379 emacs_code_class
[LEADING_CODE_PRIVATE_21
] = EMACS_leading_code_4
;
7380 emacs_code_class
[LEADING_CODE_PRIVATE_22
] = EMACS_leading_code_4
;
7382 /* ISO2022 specific initialize routine. */
7383 for (i
= 0; i
< 0x20; i
++)
7384 iso_code_class
[i
] = ISO_control_0
;
7385 for (i
= 0x21; i
< 0x7F; i
++)
7386 iso_code_class
[i
] = ISO_graphic_plane_0
;
7387 for (i
= 0x80; i
< 0xA0; i
++)
7388 iso_code_class
[i
] = ISO_control_1
;
7389 for (i
= 0xA1; i
< 0xFF; i
++)
7390 iso_code_class
[i
] = ISO_graphic_plane_1
;
7391 iso_code_class
[0x20] = iso_code_class
[0x7F] = ISO_0x20_or_0x7F
;
7392 iso_code_class
[0xA0] = iso_code_class
[0xFF] = ISO_0xA0_or_0xFF
;
7393 iso_code_class
[ISO_CODE_CR
] = ISO_carriage_return
;
7394 iso_code_class
[ISO_CODE_SO
] = ISO_shift_out
;
7395 iso_code_class
[ISO_CODE_SI
] = ISO_shift_in
;
7396 iso_code_class
[ISO_CODE_SS2_7
] = ISO_single_shift_2_7
;
7397 iso_code_class
[ISO_CODE_ESC
] = ISO_escape
;
7398 iso_code_class
[ISO_CODE_SS2
] = ISO_single_shift_2
;
7399 iso_code_class
[ISO_CODE_SS3
] = ISO_single_shift_3
;
7400 iso_code_class
[ISO_CODE_CSI
] = ISO_control_sequence_introducer
;
7402 setup_coding_system (Qnil
, &keyboard_coding
);
7403 setup_coding_system (Qnil
, &terminal_coding
);
7404 setup_coding_system (Qnil
, &safe_terminal_coding
);
7405 setup_coding_system (Qnil
, &default_buffer_file_coding
);
7407 bzero (coding_system_table
, sizeof coding_system_table
);
7409 bzero (ascii_skip_code
, sizeof ascii_skip_code
);
7410 for (i
= 0; i
< 128; i
++)
7411 ascii_skip_code
[i
] = 1;
7413 #if defined (MSDOS) || defined (WINDOWSNT)
7414 system_eol_type
= CODING_EOL_CRLF
;
7416 system_eol_type
= CODING_EOL_LF
;
7419 inhibit_pre_post_conversion
= 0;
7427 Qtarget_idx
= intern ("target-idx");
7428 staticpro (&Qtarget_idx
);
7430 Qcoding_system_history
= intern ("coding-system-history");
7431 staticpro (&Qcoding_system_history
);
7432 Fset (Qcoding_system_history
, Qnil
);
7434 /* Target FILENAME is the first argument. */
7435 Fput (Qinsert_file_contents
, Qtarget_idx
, make_number (0));
7436 /* Target FILENAME is the third argument. */
7437 Fput (Qwrite_region
, Qtarget_idx
, make_number (2));
7439 Qcall_process
= intern ("call-process");
7440 staticpro (&Qcall_process
);
7441 /* Target PROGRAM is the first argument. */
7442 Fput (Qcall_process
, Qtarget_idx
, make_number (0));
7444 Qcall_process_region
= intern ("call-process-region");
7445 staticpro (&Qcall_process_region
);
7446 /* Target PROGRAM is the third argument. */
7447 Fput (Qcall_process_region
, Qtarget_idx
, make_number (2));
7449 Qstart_process
= intern ("start-process");
7450 staticpro (&Qstart_process
);
7451 /* Target PROGRAM is the third argument. */
7452 Fput (Qstart_process
, Qtarget_idx
, make_number (2));
7454 Qopen_network_stream
= intern ("open-network-stream");
7455 staticpro (&Qopen_network_stream
);
7456 /* Target SERVICE is the fourth argument. */
7457 Fput (Qopen_network_stream
, Qtarget_idx
, make_number (3));
7459 Qcoding_system
= intern ("coding-system");
7460 staticpro (&Qcoding_system
);
7462 Qeol_type
= intern ("eol-type");
7463 staticpro (&Qeol_type
);
7465 Qbuffer_file_coding_system
= intern ("buffer-file-coding-system");
7466 staticpro (&Qbuffer_file_coding_system
);
7468 Qpost_read_conversion
= intern ("post-read-conversion");
7469 staticpro (&Qpost_read_conversion
);
7471 Qpre_write_conversion
= intern ("pre-write-conversion");
7472 staticpro (&Qpre_write_conversion
);
7474 Qno_conversion
= intern ("no-conversion");
7475 staticpro (&Qno_conversion
);
7477 Qundecided
= intern ("undecided");
7478 staticpro (&Qundecided
);
7480 Qcoding_system_p
= intern ("coding-system-p");
7481 staticpro (&Qcoding_system_p
);
7483 Qcoding_system_error
= intern ("coding-system-error");
7484 staticpro (&Qcoding_system_error
);
7486 Fput (Qcoding_system_error
, Qerror_conditions
,
7487 Fcons (Qcoding_system_error
, Fcons (Qerror
, Qnil
)));
7488 Fput (Qcoding_system_error
, Qerror_message
,
7489 build_string ("Invalid coding system"));
7491 Qcoding_category
= intern ("coding-category");
7492 staticpro (&Qcoding_category
);
7493 Qcoding_category_index
= intern ("coding-category-index");
7494 staticpro (&Qcoding_category_index
);
7496 Vcoding_category_table
7497 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX
), Qnil
);
7498 staticpro (&Vcoding_category_table
);
7501 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7503 XVECTOR (Vcoding_category_table
)->contents
[i
]
7504 = intern (coding_category_name
[i
]);
7505 Fput (XVECTOR (Vcoding_category_table
)->contents
[i
],
7506 Qcoding_category_index
, make_number (i
));
7510 Vcoding_system_safe_chars
= Fcons (Qnil
, Qnil
);
7511 staticpro (&Vcoding_system_safe_chars
);
7513 Qtranslation_table
= intern ("translation-table");
7514 staticpro (&Qtranslation_table
);
7515 Fput (Qtranslation_table
, Qchar_table_extra_slots
, make_number (1));
7517 Qtranslation_table_id
= intern ("translation-table-id");
7518 staticpro (&Qtranslation_table_id
);
7520 Qtranslation_table_for_decode
= intern ("translation-table-for-decode");
7521 staticpro (&Qtranslation_table_for_decode
);
7523 Qtranslation_table_for_encode
= intern ("translation-table-for-encode");
7524 staticpro (&Qtranslation_table_for_encode
);
7526 Qsafe_chars
= intern ("safe-chars");
7527 staticpro (&Qsafe_chars
);
7529 Qchar_coding_system
= intern ("char-coding-system");
7530 staticpro (&Qchar_coding_system
);
7532 /* Intern this now in case it isn't already done.
7533 Setting this variable twice is harmless.
7534 But don't staticpro it here--that is done in alloc.c. */
7535 Qchar_table_extra_slots
= intern ("char-table-extra-slots");
7536 Fput (Qsafe_chars
, Qchar_table_extra_slots
, make_number (0));
7537 Fput (Qchar_coding_system
, Qchar_table_extra_slots
, make_number (2));
7539 Qvalid_codes
= intern ("valid-codes");
7540 staticpro (&Qvalid_codes
);
7542 Qemacs_mule
= intern ("emacs-mule");
7543 staticpro (&Qemacs_mule
);
7545 Qraw_text
= intern ("raw-text");
7546 staticpro (&Qraw_text
);
7548 defsubr (&Scoding_system_p
);
7549 defsubr (&Sread_coding_system
);
7550 defsubr (&Sread_non_nil_coding_system
);
7551 defsubr (&Scheck_coding_system
);
7552 defsubr (&Sdetect_coding_region
);
7553 defsubr (&Sdetect_coding_string
);
7554 defsubr (&Sfind_coding_systems_region_internal
);
7555 defsubr (&Sfind_coding_systems_region_internal_2
);
7556 defsubr (&Sunencodable_char_position
);
7557 defsubr (&Sdecode_coding_region
);
7558 defsubr (&Sencode_coding_region
);
7559 defsubr (&Sdecode_coding_string
);
7560 defsubr (&Sencode_coding_string
);
7561 defsubr (&Sdecode_sjis_char
);
7562 defsubr (&Sencode_sjis_char
);
7563 defsubr (&Sdecode_big5_char
);
7564 defsubr (&Sencode_big5_char
);
7565 defsubr (&Sset_terminal_coding_system_internal
);
7566 defsubr (&Sset_safe_terminal_coding_system_internal
);
7567 defsubr (&Sterminal_coding_system
);
7568 defsubr (&Sset_keyboard_coding_system_internal
);
7569 defsubr (&Skeyboard_coding_system
);
7570 defsubr (&Sfind_operation_coding_system
);
7571 defsubr (&Supdate_coding_systems_internal
);
7572 defsubr (&Sset_coding_priority_internal
);
7573 defsubr (&Sdefine_coding_system_internal
);
7575 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list
,
7576 doc
: /* List of coding systems.
7578 Do not alter the value of this variable manually. This variable should be
7579 updated by the functions `make-coding-system' and
7580 `define-coding-system-alias'. */);
7581 Vcoding_system_list
= Qnil
;
7583 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist
,
7584 doc
: /* Alist of coding system names.
7585 Each element is one element list of coding system name.
7586 This variable is given to `completing-read' as TABLE argument.
7588 Do not alter the value of this variable manually. This variable should be
7589 updated by the functions `make-coding-system' and
7590 `define-coding-system-alias'. */);
7591 Vcoding_system_alist
= Qnil
;
7593 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list
,
7594 doc
: /* List of coding-categories (symbols) ordered by priority.
7596 On detecting a coding system, Emacs tries code detection algorithms
7597 associated with each coding-category one by one in this order. When
7598 one algorithm agrees with a byte sequence of source text, the coding
7599 system bound to the corresponding coding-category is selected. */);
7603 Vcoding_category_list
= Qnil
;
7604 for (i
= CODING_CATEGORY_IDX_MAX
- 1; i
>= 0; i
--)
7605 Vcoding_category_list
7606 = Fcons (XVECTOR (Vcoding_category_table
)->contents
[i
],
7607 Vcoding_category_list
);
7610 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read
,
7611 doc
: /* Specify the coding system for read operations.
7612 It is useful to bind this variable with `let', but do not set it globally.
7613 If the value is a coding system, it is used for decoding on read operation.
7614 If not, an appropriate element is used from one of the coding system alists:
7615 There are three such tables, `file-coding-system-alist',
7616 `process-coding-system-alist', and `network-coding-system-alist'. */);
7617 Vcoding_system_for_read
= Qnil
;
7619 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write
,
7620 doc
: /* Specify the coding system for write operations.
7621 Programs bind this variable with `let', but you should not set it globally.
7622 If the value is a coding system, it is used for encoding of output,
7623 when writing it to a file and when sending it to a file or subprocess.
7625 If this does not specify a coding system, an appropriate element
7626 is used from one of the coding system alists:
7627 There are three such tables, `file-coding-system-alist',
7628 `process-coding-system-alist', and `network-coding-system-alist'.
7629 For output to files, if the above procedure does not specify a coding system,
7630 the value of `buffer-file-coding-system' is used. */);
7631 Vcoding_system_for_write
= Qnil
;
7633 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used
,
7634 doc
: /* Coding system used in the latest file or process I/O. */);
7635 Vlast_coding_system_used
= Qnil
;
7637 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion
,
7638 doc
: /* *Non-nil means always inhibit code conversion of end-of-line format.
7639 See info node `Coding Systems' and info node `Text and Binary' concerning
7640 such conversion. */);
7641 inhibit_eol_conversion
= 0;
7643 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system
,
7644 doc
: /* Non-nil means process buffer inherits coding system of process output.
7645 Bind it to t if the process output is to be treated as if it were a file
7646 read from some filesystem. */);
7647 inherit_process_coding_system
= 0;
7649 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist
,
7650 doc
: /* Alist to decide a coding system to use for a file I/O operation.
7651 The format is ((PATTERN . VAL) ...),
7652 where PATTERN is a regular expression matching a file name,
7653 VAL is a coding system, a cons of coding systems, or a function symbol.
7654 If VAL is a coding system, it is used for both decoding and encoding
7656 If VAL is a cons of coding systems, the car part is used for decoding,
7657 and the cdr part is used for encoding.
7658 If VAL is a function symbol, the function must return a coding system
7659 or a cons of coding systems which are used as above. The function gets
7660 the arguments with which `find-operation-coding-system' was called.
7662 See also the function `find-operation-coding-system'
7663 and the variable `auto-coding-alist'. */);
7664 Vfile_coding_system_alist
= Qnil
;
7666 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist
,
7667 doc
: /* Alist to decide a coding system to use for a process I/O operation.
7668 The format is ((PATTERN . VAL) ...),
7669 where PATTERN is a regular expression matching a program name,
7670 VAL is a coding system, a cons of coding systems, or a function symbol.
7671 If VAL is a coding system, it is used for both decoding what received
7672 from the program and encoding what sent to the program.
7673 If VAL is a cons of coding systems, the car part is used for decoding,
7674 and the cdr part is used for encoding.
7675 If VAL is a function symbol, the function must return a coding system
7676 or a cons of coding systems which are used as above.
7678 See also the function `find-operation-coding-system'. */);
7679 Vprocess_coding_system_alist
= Qnil
;
7681 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist
,
7682 doc
: /* Alist to decide a coding system to use for a network I/O operation.
7683 The format is ((PATTERN . VAL) ...),
7684 where PATTERN is a regular expression matching a network service name
7685 or is a port number to connect to,
7686 VAL is a coding system, a cons of coding systems, or a function symbol.
7687 If VAL is a coding system, it is used for both decoding what received
7688 from the network stream and encoding what sent to the network stream.
7689 If VAL is a cons of coding systems, the car part is used for decoding,
7690 and the cdr part is used for encoding.
7691 If VAL is a function symbol, the function must return a coding system
7692 or a cons of coding systems which are used as above.
7694 See also the function `find-operation-coding-system'. */);
7695 Vnetwork_coding_system_alist
= Qnil
;
7697 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system
,
7698 doc
: /* Coding system to use with system messages.
7699 Also used for decoding keyboard input on X Window system. */);
7700 Vlocale_coding_system
= Qnil
;
7702 /* The eol mnemonics are reset in startup.el system-dependently. */
7703 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix
,
7704 doc
: /* *String displayed in mode line for UNIX-like (LF) end-of-line format. */);
7705 eol_mnemonic_unix
= build_string (":");
7707 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos
,
7708 doc
: /* *String displayed in mode line for DOS-like (CRLF) end-of-line format. */);
7709 eol_mnemonic_dos
= build_string ("\\");
7711 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac
,
7712 doc
: /* *String displayed in mode line for MAC-like (CR) end-of-line format. */);
7713 eol_mnemonic_mac
= build_string ("/");
7715 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided
,
7716 doc
: /* *String displayed in mode line when end-of-line format is not yet determined. */);
7717 eol_mnemonic_undecided
= build_string (":");
7719 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation
,
7720 doc
: /* *Non-nil enables character translation while encoding and decoding. */);
7721 Venable_character_translation
= Qt
;
7723 DEFVAR_LISP ("standard-translation-table-for-decode",
7724 &Vstandard_translation_table_for_decode
,
7725 doc
: /* Table for translating characters while decoding. */);
7726 Vstandard_translation_table_for_decode
= Qnil
;
7728 DEFVAR_LISP ("standard-translation-table-for-encode",
7729 &Vstandard_translation_table_for_encode
,
7730 doc
: /* Table for translating characters while encoding. */);
7731 Vstandard_translation_table_for_encode
= Qnil
;
7733 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist
,
7734 doc
: /* Alist of charsets vs revision numbers.
7735 While encoding, if a charset (car part of an element) is found,
7736 designate it with the escape sequence identifying revision (cdr part of the element). */);
7737 Vcharset_revision_alist
= Qnil
;
7739 DEFVAR_LISP ("default-process-coding-system",
7740 &Vdefault_process_coding_system
,
7741 doc
: /* Cons of coding systems used for process I/O by default.
7742 The car part is used for decoding a process output,
7743 the cdr part is used for encoding a text to be sent to a process. */);
7744 Vdefault_process_coding_system
= Qnil
;
7746 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table
,
7747 doc
: /* Table of extra Latin codes in the range 128..159 (inclusive).
7748 This is a vector of length 256.
7749 If Nth element is non-nil, the existence of code N in a file
7750 \(or output of subprocess) doesn't prevent it to be detected as
7751 a coding system of ISO 2022 variant which has a flag
7752 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file
7753 or reading output of a subprocess.
7754 Only 128th through 159th elements has a meaning. */);
7755 Vlatin_extra_code_table
= Fmake_vector (make_number (256), Qnil
);
7757 DEFVAR_LISP ("select-safe-coding-system-function",
7758 &Vselect_safe_coding_system_function
,
7759 doc
: /* Function to call to select safe coding system for encoding a text.
7761 If set, this function is called to force a user to select a proper
7762 coding system which can encode the text in the case that a default
7763 coding system used in each operation can't encode the text.
7765 The default value is `select-safe-coding-system' (which see). */);
7766 Vselect_safe_coding_system_function
= Qnil
;
7768 DEFVAR_BOOL ("coding-system-require-warning",
7769 &coding_system_require_warning
,
7770 doc
: /* Internal use only.
7771 If non-nil, on writing a file, `select-safe-coding-system-function' is
7772 called even if `coding-system-for-write' is non-nil. The command
7773 `universal-coding-system-argument' binds this variable to t temporarily. */);
7774 coding_system_require_warning
= 0;
7777 DEFVAR_LISP ("char-coding-system-table", &Vchar_coding_system_table
,
7778 doc
: /* Char-table containing safe coding systems of each characters.
7779 Each element doesn't include such generic coding systems that can
7780 encode any characters. They are in the first extra slot. */);
7781 Vchar_coding_system_table
= Fmake_char_table (Qchar_coding_system
, Qnil
);
7783 DEFVAR_BOOL ("inhibit-iso-escape-detection",
7784 &inhibit_iso_escape_detection
,
7785 doc
: /* If non-nil, Emacs ignores ISO2022's escape sequence on code detection.
7787 By default, on reading a file, Emacs tries to detect how the text is
7788 encoded. This code detection is sensitive to escape sequences. If
7789 the sequence is valid as ISO2022, the code is determined as one of
7790 the ISO2022 encodings, and the file is decoded by the corresponding
7791 coding system (e.g. `iso-2022-7bit').
7793 However, there may be a case that you want to read escape sequences in
7794 a file as is. In such a case, you can set this variable to non-nil.
7795 Then, as the code detection ignores any escape sequences, no file is
7796 detected as encoded in some ISO2022 encoding. The result is that all
7797 escape sequences become visible in a buffer.
7799 The default value is nil, and it is strongly recommended not to change
7800 it. That is because many Emacs Lisp source files that contain
7801 non-ASCII characters are encoded by the coding system `iso-2022-7bit'
7802 in Emacs's distribution, and they won't be decoded correctly on
7803 reading if you suppress escape sequence detection.
7805 The other way to read escape sequences in a file without decoding is
7806 to explicitly specify some coding system that doesn't use ISO2022's
7807 escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */);
7808 inhibit_iso_escape_detection
= 0;
7810 DEFVAR_LISP ("translation-table-for-input", &Vtranslation_table_for_input
,
7811 doc
: /* Char table for translating self-inserting characters.
7812 This is applied to the result of input methods, not their input. See also
7813 `keyboard-translate-table'. */);
7814 Vtranslation_table_for_input
= Qnil
;
7818 emacs_strerror (error_number
)
7823 synchronize_system_messages_locale ();
7824 str
= strerror (error_number
);
7826 if (! NILP (Vlocale_coding_system
))
7828 Lisp_Object dec
= code_convert_string_norecord (build_string (str
),
7829 Vlocale_coding_system
,
7831 str
= (char *) SDATA (dec
);