1 Network Working Group K. Konishi
2 Request for Comments: 3743 K. Huang
3 Category: Informational H. Qian
4 Y. Ko
5 April 2004
6
7
8 Joint Engineering Team (JET) Guidelines for
9 Internationalized Domain Names (IDN) Registration and
10 Administration for Chinese, Japanese, and Korean
11
12 Status of this Memo
13
14 This memo provides information for the Internet community. It does
15 not specify an Internet standard of any kind. Distribution of this
16 memo is unlimited.
17
18 Copyright Notice
19
20 Copyright (C) The Internet Society (2004). All Rights Reserved.
21
22 IESG Note
23
24 The IESG congratulates the Joint Engineering Team (JET) on developing
25 mechanisms to enforce their desired policy. The Language Variant
26 Table mechanisms described here allow JET to enforce language-based
27 character variant preferences, and they set an example for those who
28 might want to use variant tables for their own policy enforcement.
29
30 The IESG encourages those following this example to take JET's
31 diligence as an example, as well as its technical work. To follow
32 their example, registration authorities may need to articulate
33 policy, develop appropriate procedures and mechanisms for
34 enforcement, and document the relationship between the two. JET's
35 LVT mechanism should be adaptable to different policies, and can be
36 considered during that development process.
37
38 The IETF does not, of course, dictate policy or require the use of
39 any particular mechanisms for the implementation of these policies,
40 as these are matters of sovereignty and contract.
41
42 Abstract
43
44 Achieving internationalized access to domain names raises many
45 complex issues. These are associated not only with basic protocol
46 design, such as how names are represented on the network, compared,
47 and converted to appropriate forms, but also with issues and options
48 for deployment, transition, registration, and administration.
49
50
51
52 Konishi, et al. Informational [Page 1]
53 RFC 3743 JET Guidelines for IDN April 2004
54
55
56 The IETF Standards for Internationalized Domain Names, known as
57 "IDNA", focuses on access to domain names in a range of scripts that
58 is broader in scope than the original ASCII. The development process
59 made it clear that use of characters with similar appearances and/or
60 interpretations created potential for confusion, as well as
61 difficulties in deployment and transition. The conclusion was that,
62 while those issues were important, they could best be addressed
63 administratively rather than through restrictions embedded in the
64 protocols. This document defines a set of guidelines for applying
65 restrictions of that type for Chinese, Japanese and Korean (CJK)
66 scripts and the zones that use them and, perhaps, the beginning of a
67 framework for thinking about other zones, languages, and scripts.
68
69 Table of Contents
70
71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
72 2. Definitions, Context, and Notation . . . . . . . . . . . . . . 5
73 2.1. Definitions and Context. . . . . . . . . . . . . . . . . 5
74 2.2. Notation for Ideographs and Other Non-ASCII CJK
75 Characters . . . . . . . . . . . . . . . . . . . . . . . 9
76 3. Scope of the Administrative Guidelines . . . . . . . . . . . . 9
77 3.1. Principles Underlying These Guidelines . . . . . . . . . 10
78 3.2. Registration of IDL. . . . . . . . . . . . . . . . . . . 13
79 3.2.1. Using the Language Variant Table . . . . . . . . 13
80 3.2.2. IDL Package. . . . . . . . . . . . . . . . . . . 14
81 3.2.3. Procedure for Registering IDLs . . . . . . . . . 14
82 3.3. Deletion and Transfer of IDL and IDL Package . . . . . . 19
83 3.4. Activation and Deactivation of IDL Variants . . . . . . 19
84 3.4.1. Activation Algorithm . . . . . . . . . . . . . . 19
85 3.4.2. Deactivation Algorithm . . . . . . . . . . . . . 20
86 3.5. Managing Changes in Language Associations. . . . . . . . 21
87 3.6. Managing Changes to Language Variant Tables. . . . . . . 21
88 4. Examples of Guideline Use in Zones . . . . . . . . . . . . . . 21
89 5. Syntax Description for the Language Variant Table. . . . . . . 25
90 5.1. ABNF Syntax. . . . . . . . . . . . . . . . . . . . . . . 25
91 5.2. Comments and Explanation of Syntax . . . . . . . . . . . 25
92 6. Security Considerations. . . . . . . . . . . . . . . . . . . . 27
93 7. Index to Terminology . . . . . . . . . . . . . . . . . . . . . 27
94 8. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 28
95 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
96 9.1. Normative References . . . . . . . . . . . . . . . . . . 29
97 9.2. Informative References . . . . . . . . . . . . . . . . . 30
98 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 30
99 10.1. Authors' Addresses . . . . . . . . . . . . . . . . . . . 31
100 10.2. Editors' Addresses . . . . . . . . . . . . . . . . . . . 32
101 11. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 33
102
103
104
105
106
107 Konishi, et al. Informational [Page 2]
108 RFC 3743 JET Guidelines for IDN April 2004
109
110
111 1. Introduction
112
113 Domain names form the fundamental naming architecture of the
114 Internet. Countless Internet protocols and applications rely on
115 them, not just for stability and continuity, but also to avoid
116 ambiguity. They were designed to be identifiers without any language
117 context. However, as domain names have become visible to end users
118 through Web URLs and e-mail addresses, the strings in domain-name
119 labels are being increasingly interpreted as names, words, or
120 phrases. It is likely that users will do the same with languages of
121 differing character sets, such as Chinese, Japanese and Korean (CJK),
122 in which many words or concepts are represented using short sequences
123 of characters.
124
125 The introduction of what are called Internationalized Domain Names
126 (IDN) amplifies both the difficulty of putting names into identifiers
127 and the confusion that exists between scripts and languages.
128 Character symbols that appear (or actually are) identical, or that
129 have similar or identical semantics but that are assigned the
130 different code points, further increase the potential for confusion.
131 DNS internationalization also affects a number of Internet protocols
132 and applications and creates additional layers of complexity in terms
133 of technical administration and services. Given the added
134 complications of using a much broader range of characters than the
135 original small ASCII subset, precautions are necessary in the
136 deployment of IDNs in order to minimize confusion and fraud.
137
138 The IETF IDN Working Group [IDN-WG] addressed the problem of handling
139 the encoding and decoding of Unicode strings into and out of Domain
140 Name System (DNS) labels with the goal that its solution would not
141 put the operational DNS at any risk. Its work resulted in one
142 primary protocol and three supporting ones, respectively:
143
144 1. Internationalizing Host Names in Applications [IDNA]
145 2. Preparation of Internationalized Strings [STRINGPREP]
146 3. A Stringprep Profile for Internationalized Domain Names
147 [NAMEPREP]
148 4. Punycode [PUNYCODE]
149
150 IDNA, which calls on the others, normalizes and transforms strings
151 that are intended to be used as IDNs. In combination, the four
152 provide the minimum functions required for internationalization, such
153 as performing case mappings, eliminating character differences that
154 would cause severe problems, and specifying matching (equality).
155 They also convert between the resulting Unicode code points and an
156 ASCII-based form that is more suitable for storing in actual DNS
157 labels. In this way, the IDNA transformations improve a user's
158 chances of getting to the correct IDN.
159
160
161
162 Konishi, et al. Informational [Page 3]
163 RFC 3743 JET Guidelines for IDN April 2004
164
165
166 Addressing the issues around differing character sets, a primary
167 consideration and administrative challenge involves region-specific
168 definitions, interpretations, and the semantics of strings to be used
169 in IDNs. A Unicode string may have a specific meaning as a name,
170 word, or phrase in a particular language but that meaning could vary
171 depending on the country, region, culture, or other context in which
172 the string is used. It might also have different interpretations in
173 different languages that share some or all of the same characters.
174 Therefore, individual zones and zone administrators may find it
175 necessary to impose restrictions and procedures to reduce the
176 likelihood of confusion, and instabilities of reference, within their
177 own environments.
178
179 Over the centuries, the evolution of CJK characters, and the
180 differences in their use in different languages and even in different
181 regions where the same language is spoken, has given rise to the idea
182 of "variants", wherein one conceptual character can be identified
183 with several different Code Points in character sets for computer
184 use. This document provides a framework for handling such variants
185 while minimizing the possibility of serious user confusion in the
186 obtaining or using of domain names. However, the concept of variants
187 is complex and may require many different layers of solutions. This
188 guideline offers only one of those solution components. It is not
189 sufficient by itself to solve the whole problem, even with zone-
190 specific tables as described below.
191
192 Additionally, because of local language or writing-system
193 differences, it is impossible to create universally accepted
194 definitions for which potential variants are the same and which are
195 not the same. It is even more difficult to define a technical
196 algorithm to generate variants that are linguistically accurate.
197 That is, that the variant forms produced make as much sense in the
198 language as the originally specified forms. It is also possible that
199 variants generated may have no meaning in the associated language or
200 languages. The intention is not to generate meaningful "words" but
201 to generate similar variants to be reserved. So even though the
202 method described in this document may not always be linguistically
203 accurate, nor does it need to be, it increases the chances of getting
204 the right variants while accepting the inherent limitations of the
205 DNS and the complexities of human language.
206
207 This document outlines a model for such conventions for zones in
208 which labels that contain CJK characters are to be registered and a
209 system for implementing that model. It provides a mechanism that
210 allows each zone to define its own local rules for permitted
211 characters and sequences and the handling of IDNs and their variants.
212
213
214
215
216
217 Konishi, et al. Informational [Page 4]
218 RFC 3743 JET Guidelines for IDN April 2004
219
220
221 The document is an effort of the Joint Engineering Team (JET), a
222 group composed of members of CNNIC, TWNIC, KRNIC, and JPNIC as well
223 as other individual experts. It offers guidelines for zone
224 administrators, including but not limited to registry operators and
225 registrars and information for all domain names holders on the
226 administration of domain names that contain characters drawn from
227 Chinese, Japanese, and Korean scripts. Other language groups are
228 encouraged to develop their own guidelines as needed, based on these
229 guidelines if that is helpful.
230
231 2. Definitions, Context, and Notation
232
233 2.1. Definitions and Context
234
235 This document uses a number of special terms. In this section,
236 definitions and explanations are grouped topically. Some readers may
237 prefer to skip over this material, returning, perhaps via the index
238 to terminology in section 7, when needed.
239
240 2.1.1. IDN
241
242 IDN: The term "IDN" has a number of different uses: (a) as an
243 abbreviation for "Internationalized Domain Name"; (b) as a fully
244 qualified domain name that contains at least one label that contains
245 characters not appearing in ASCII, specifically not in the subset of
246 ASCII recommended for domain names (the so-called "hostname" or "LDH"
247 subset, see RFC1035 [STD13]); (c) as a label of a domain name that
248 contains at least one character beyond ASCII; (d) as a Unicode string
249 to be processed by Nameprep; (e) as a string that is an output from
250 Nameprep; (f) as a string that is the result of processing through
251 both Nameprep and conversion into Punycode; (g) as the abbreviation
252 of an IDN (more properly, IDL) Package, in the terminology of this
253 document; (h) as the abbreviation of the IETF IDN Working Group; (g)
254 as the abbreviation of the ICANN IDN Committee; and (h) as standing
255 for other IDN activities in other companies/organizations.
256
257 Because of the potential confusion, this document uses the term "IDN"
258 as an abbreviation for Internationalized Domain Name and,
259 specifically, in the second sense described in (b) above. It uses
260 "IDL," defined immediately below, to refer to Internationalized
261 Domain Labels.
262
263 2.1.2. IDL
264
265 IDL: This document provides a guideline to be applied on a per-zone
266 basis, one label at a time. Therefore, the term "Internationalized
267 Domain Label" or "IDL" will be used instead of the more general term
268 "IDN" or its equivalents. The processing specifications of this
269
270
271
272 Konishi, et al. Informational [Page 5]
273 RFC 3743 JET Guidelines for IDN April 2004
274
275
276 document may be applied, in some zones, to ASCII characters also, if
277 those characters are specified as valid in a Language Variant Table
278 (see below). Hence, in some zones, an IDL may contain or consist
279 entirely of "LDH" characters.
280
281 2.1.3. FQDN
282
283 FQDN: A fully qualified domain name, one that explicitly contains all
284 labels, including a Top-Level Domain (TLD) name. In this context, a
285 TLD name is one whose label appears in a nameserver record in the
286 root zone. The term "Domain Name Label" refers to any label of a
287 FQDN.
288
289 2.1.4. Registrations
290
291 Registration: In this document, the term "registration" refers to the
292 process by which a potential domain name holder requests that a label
293 be placed in the DNS either as an individual name within a domain or
294 as a subdomain delegation from another domain name holder. In the
295 case of a successful registration, the label or delegation records
296 are placed in the relevant zone file, or, more specifically, they are
297 "activated" or made "active" and additional IDLs may be reserved as
298 part of an "IDL Package" (see below). The guidelines presented here
299 are recommended for all zones, at any hierarchy level, in which CJK
300 characters are to appear and not just domains at the first or second
301 level.
302
303 2.1.5. RFC3066
304
305 RFC3066: A system, widely used in the Internet, for coding and
306 representing names of languages [RFC3066]. It is based on an
307 International Organization for Standardization (ISO) standard for
308 coding language names [ISO639], but expands it to provide additional
309 precision.
310
311 2.1.6. ISO/IEC 10646
312
313 ISO/IEC 10646: The international standard universal multiple-octet
314 coded character set ("UCS") [IS10646]. The Code Point definitions of
315 this standard are identical to those of corresponding versions of the
316 Unicode standard (see below). Consequently, the characters and their
317 coding are often referred to as "Unicode characters."
318
319 2.1.7. Unicode Character
320
321 Unicode Character: The term "Unicode character" is used here in
322 reference to characters chosen from the Unicode Standard Version 3.2
323 [UNICODE] (and hence from ISO/IEC 10646). In this document, the
324
325
326
327 Konishi, et al. Informational [Page 6]
328 RFC 3743 JET Guidelines for IDN April 2004
329
330
331 characters are identified by their positions, or "Code Points." The
332 notation U+12AB, for example, indicates the character at the position
333 12AB (hexadecimal) in the Unicode 3.2 table. For characters in
334 positions above FFFF, i.e., requiring more than sixteen bits to
335 represent, a five to eight-character string is used, such as U+112AB
336 for the character in position 12AB of plane 1.
337
338 2.1.8. Unicode String
339
340 Unicode String: "Unicode string" refers to a string of Unicode
341 characters. The Unicode string is identified by the sequence of the
342 Unicode characters regardless of the encoding scheme.
343
344 2.1.9. CJK Characters
345
346 CJK Characters: CJK characters are characters commonly used in the
347 Chinese, Japanese, or Korean languages, including but not limited to
348 those defined in the Unicode Standard as ASCII (U+0020 to U+007F),
349 Han ideographs (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo
350 (U+3100 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF),
351 Jamo (U+1100 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF
352 and U+3130 to U+318F), and the respective compatibility forms. The
353 particular characters that are permitted in a given zone are
354 specified in the Language Variant Table(s) for that zone.
355
356 2.1.10. Label String
357
358 Label String: A generic term referring to a string of characters that
359 is a candidate for registration in the DNS or such a string, once
360 registered. A label string may or may not be valid according to the
361 rules of this specification and may even be invalid for IDNA use.
362 The term "label", by itself, refers to a string that has been
363 validated and may be formatted to appear in a DNS zone file.
364
365 2.1.11. Language Variant Table
366
367 Language Variant Table: The key mechanisms of this specification
368 utilize a three-column table, called a Language Variant Table, for
369 each language permitted to be registered in the zone. Those columns
370 are known, respectively, as "Valid Code Point", "Preferred Variant",
371 and "Character Variant", which are defined separately below. The
372 Language Variant Tables are critical to the success of the guideline
373 described in this document. However, the principles to be used to
374 generate the tables are not within the scope of this document and
375 should be worked out by each registry separately (perhaps by adopting
376 or adapting the work of some other registry). In this document,
377 "Table" and "Variant Table" are used as short forms for Language
378 Variant Table.
379
380
381
382 Konishi, et al. Informational [Page 7]
383 RFC 3743 JET Guidelines for IDN April 2004
384
385
386 2.1.12. Valid Code Point
387
388 Valid Code Point: In a Language Variant Table, the list of Code
389 Points that is permitted for that language. Any other Code Points,
390 or any string containing them, will be rejected by this
391 specification. The Valid Code Point list appears as the first column
392 of the Language Variant Table.
393
394 2.1.13. Preferred Variant
395
396 Preferred Variant: In a Language Variant Table, a list of Code Points
397 corresponding to each Valid Code Point and providing possible
398 substitutions for it. These substitutions are "preferred" in the
399 sense that the variant labels generated using them are normally
400 registered in the zone file, or "activated." The Preferred Code
401 Points appear in column 2 of the Language Variant Table. "Preferred
402 Code Point" is used interchangeably with this term.
403
404 2.1.14. Character Variant
405
406 Character Variant: In a Language Variant Table, a second list of Code
407 Points corresponding to each Valid Code Point and providing possible
408 substitutions for it. Unlike the Preferred Variants, substitutions
409 based on Character Variants are normally reserved but not actually
410 registered (or "activated"). Character Variants appear in column 3
411 of the Language Variant Table. The term "Code Point Variants" is
412 used interchangeably with this term.
413
414 2.1.15. Preferred Variant Label
415
416 Preferred Variant Label: A label generated by use of Preferred
417 Variants (or Preferred Code Points).
418
419 2.1.16. Character Variant Label
420
421 Character Variant Label: A label generated by use of Character
422 Variants.
423
424 2.1.17. Zone Variant
425
426 Zone Variant: A Preferred or Character Variant Label that is actually
427 to be entered (registered) into the DNS. That is, into the zone file
428 for the relevant zone. Zone Variants are also referred to as Zone
429 Variant Labels or Active (or Activated) Labels.
430
431
432
433
434
435
436
437 Konishi, et al. Informational [Page 8]
438 RFC 3743 JET Guidelines for IDN April 2004
439
440
441 2.1.18. IDL Package
442
443 IDL Package: A collection of IDLs as determined by these Guidelines.
444 All labels in the package are "reserved", meaning they cannot be
445 registered by anyone other than the holder of the Package. These
446 reserved IDLs may be "activated", meaning they are actually entered
447 into a zone file as a "Zone Variant". The IDL Package also contains
448 identification of the language(s) associated with the registration
449 process. The IDL and its variant labels form a single, atomic unit.
450
451 2.2. Notation for Ideographs and Other Non-ASCII CJK Characters.
452
453 For purposes of clarity, particularly in regard to examples, Han
454 ideographs appear in several places in this document. However, they
455 do not appear in the ASCII version of this document. For the
456 convenience of readers of the ASCII version, and some readers not
457 familiar with recognizing and distinguishing Chinese characters, most
458 uses of these characters will be associated with both their Unicode
459 Code Points and an "asterisk tag" with its corresponding Chinese
460 Romanization [ISO7098], with the tone mark represented by a number
461 from 1 to 4. Those tags have no meaning outside this document; they
462 are a quick visual and reading reference to help facilitate the
463 combinations and transformations of characters in the guideline and
464 table excerpts.
465
466 3. Scope of the Administrative Guidelines
467
468 Zone administrators are responsible for the administration of the
469 domain name labels under their control. A zone administrator might
470 be responsible for a large zone, such as a top-level domain (TLD),
471 whether generic or country code, or a smaller one, such as a typical
472 second- or third-level domain. A large zone is often more complex
473 than its smaller counterpart. However, actual technical
474 administrative tasks, such as addition, deletion, delegation, and
475 transfer of zones between domain name holders, are similar for all
476 zones.
477
478 This document provides guidelines for the ways CJK characters should
479 be handled within a zone, for how language issues should be
480 considered and incorporated, and for how Domain Name Labels
481 containing CJK characters should be administered (including
482 registration, deletion, and transfer of labels).
483
484 Other IDN policies, such as the creation of new top-level domains
485 (TLDs), the cost structure for registrations, and how the processes
486 described here get allocated between registrar and registry if the
487 zone makes that distinction, also are outside the scope of this
488 document.
489
490
491
492 Konishi, et al. Informational [Page 9]
493 RFC 3743 JET Guidelines for IDN April 2004
494
495
496 Technical implementation issues are not discussed here either. For
497 example, deciding which guidelines should be implemented as registry
498 actions and which should be registrar actions is left to zone
499 administrators, with the possibility that it will differ from zone to
500 zone.
501
502 3.1. Principles Underlying These Guidelines
503
504 In many places, in the event of a dispute over rights to a name (or,
505 more accurately, DNS label string), this document assumes "first-
506 come, first-served" (FCFS) as a resolution policy even though FCFS is
507 not listed below as one of the principles for this document. If
508 policies are already in place governing priorities and "rights", one
509 can use the guidelines here by replacing uses of FCFS in this
510 document with policies specific to the zone. Some of the guidelines
511 here may not be applicable to other policies for determining rights
512 to labels. Still other alternatives, such as use of UDRP [UDRP] or
513 mutual exclusion, might have little impact on other aspects of these
514 guidelines.
515
516 (a) Although some Unicode strings may be pure identifiers made up of
517 an assortment of characters from many languages and scripts, IDLs are
518 likely to be "words" or "names" or "phrases" that have specific
519 meaning in a language. While a zone administration might or might
520 not require "meaning" as a registration criterion, meaning could
521 prove to be a useful tool for avoiding user confusion.
522
523 Each IDL to be registered should be associated administratively
524 with one or more languages.
525
526 Language associations should either be predetermined by the zone
527 administrator and applied to the entire zone or be chosen by the
528 registrants on a per-IDL basis. The latter may be necessary for some
529 zones, but it will make administration more difficult and will
530 increase the likelihood of conflicts in variant forms.
531
532 A given zone might have multiple languages associated with it or it
533 may have no language specified at all. Omitting specification of a
534 language may provide additional opportunities for user confusion and
535 is therefore NOT recommended.
536
537 (b) Each language uses only a subset of Unicode characters.
538 Therefore, if an IDL is associated with a language, it is not
539 permitted to contain any Unicode character that is not within the
540 valid subset for that language.
541
542 Each IDL to be registered must be verified against the valid
543 subset of Unicode for the language(s) associated with the IDL.
544
545
546
547 Konishi, et al. Informational [Page 10]
548 RFC 3743 JET Guidelines for IDN April 2004
549
550
551 That subset is specified by the list of characters appearing in
552 the first column of the language and zone-specific tables as
553 described later in this document.
554
555 If the IDL fails this test for any of its associated languages, the
556 IDL is not valid for registration.
557
558 Note that this verification is not necessarily linguistically
559 accurate, because some languages have special rules. For example,
560 some languages impose restrictions on the order in which particular
561 combinations of characters may appear. Characters that are valid for
562 the language, and hence permitted by this specification, might still
563 not form valid words or even strings in the language.
564
565 (c) When an IDL is associated with a language, it may have Character
566 Variants that depend on that language associated with it in addition
567 to any Preferred Variants. These variants are potential sources of
568 confusion with the Code Points in the original label string.
569 Consequently, the labels generated from them should be unavailable to
570 registrants of other names, words, or phrases.
571
572 During registration, all labels generated from the Character
573 Variants for the associated language(s) of the IDL should be
574 reserved.
575
576 IDL reservations of the type described here normally do not appear in
577 the distributed DNS zone file. In other words, these reserved IDLs
578 may not resolve. Domain name holders could request that these
579 reserved IDLs be placed in the zone file and made active and
580 resolvable.
581
582 Zones will need to establish local policies about how they are to be
583 made active. Specifically, many zones, especially at the top level,
584 have prohibited or restricted the use of "CNAME"s DNS aliases,
585 especially CNAMEs that point to nameserver delegation records (NS
586 records). And long-term use of long-term aliases for domain
587 hierarchies, rather than single names ("DNAME records") are
588 considered problematic because of the recursion they can introduce
589 into DNS lookups.
590
591 (d) When an IDL is a "name", "word", or "phrase", it will have
592 Character Variants depending on the associated language.
593 Furthermore, one or more of those Character Variants will be used
594 more often than others for linguistic, political, or other reasons.
595
596 These more commonly used variants are distinguished from ordinary
597 Character Variants and are known as Preferred Variant(s) for the
598 particular language.
599
600
601
602 Konishi, et al. Informational [Page 11]
603 RFC 3743 JET Guidelines for IDN April 2004
604
605
606 To increase the likelihood of correct and predictable resolution
607 of the IDN by end users, all labels generated from the Preferred
608 Variants for the associated language(s) should be resolvable.
609
610 In other words, the Preferred Variant Labels should appear in the
611 distributed DNS zone file.
612
613 (e) IDLs associated with one or more languages may have a large
614 number of Character Variant Labels or Preferred Variant Labels. Some
615 of these labels may include combinations of characters that are
616 meaningless or invalid linguistically. It may therefore be
617 appropriate for a zone to adopt procedures that include only
618 linguistically acceptable labels in the IDL Package.
619
620 A zone administrator may impose additional rules and other
621 processing activities to limit the number of Character Variant
622 Labels or Preferred Variant Labels that are actually reserved or
623 registered.
624
625 These additional rules and other processing activities are based on
626 policies and/or procedures imposed on a per-zone basis and therefore
627 are not within the scope of this document. Such policies or
628 procedures might be used, for example, to restrict the number of
629 Preferred Variant Labels actually reserved or to prevent certain
630 words from being registered at all.
631
632 (f) There are some Character Variant Labels and Preferred Variant
633 Labels that are associated with each IDL. These labels are
634 considered "equivalent" to each another. To avoid confusion, they
635 all should be assigned to a single domain name holder.
636
637 The IDL and its variant labels should be grouped together into a
638 single atomic unit, known in this document as an "IDL Package".
639
640 The IDL Package is created upon registration and is atomic: Transfer
641 and deletion of an IDL is performed on the IDL Package as a whole.
642 That is, an IDL within the IDL Package may not be transferred or
643 deleted individually; any re-registration, transfers, or other
644 actions that impact the IDL should also affect the other variants.
645
646 The name-conflict resolution policy associated with this zone could
647 result in a conflict with the principle of IDL Package atomicity. In
648 such a case, the policy must be defined to make the precedence clear.
649
650
651
652
653
654
655
656
657 Konishi, et al. Informational [Page 12]
658 RFC 3743 JET Guidelines for IDN April 2004
659
660
661 3.2. Registration of IDL
662
663 To conform to the principles described in 3.1, this document
664 introduces two concepts: the Language Variant Table and the IDL
665 Package. These are described in the next two subsections, followed
666 by a description of the algorithm that is used to interpret the table
667 and generate variant labels.
668
669 3.2.1. Using the Language Variant Table
670
671 For each zone that uses a given language, each language should have
672 its own Language Variant Table. The table consists of a header
673 section that identifies references and version information, followed
674 by a section with one row for each Code Point that is valid for the
675 language and three columns.
676
677 (1) The first column contains the subset of Unicode characters
678 that is valid to be registered ("Valid Code Point"). This is
679 used to verify the IDL to be registered (see 3.1b). As in the
680 registration procedure described later, this column is used as
681 an index to examine characters that appear in a proposed IDL
682 to be processed. The collection of Valid Code Points in the
683 table for a particular language can be thought of as defining
684 the script for that language, although the normal definition
685 of a script would not include, for example, ASCII characters
686 with CJK ones.
687
688 (2) The second column contains the Preferred Variant(s) of the
689 corresponding Unicode character in column one ("Valid Code
690 Point"). These variant characters are used to generate the
691 Preferred Variant Labels for the IDL. Those labels should be
692 resolvable (see 3.1d). Under normal circumstances, all of
693 those Preferred Variant Labels will be activated in the
694 relevant zone file so that they will resolve when the DNS is
695 queried for them.
696
697 (3) The third column contains the Character Variant(s) for the
698 corresponding Valid Code Point. These are used to generate
699 the Character Variant Labels of the IDL, which are then to be
700 reserved (see 3.1c). Registration, or activation, of labels
701 generated from Character Variants will normally be a
702 registrant decision, subject to local policy.
703
704 Each entry in a column consists of one or more Code Points, expressed
705 as a numeric character number in the Unicode table and optionally
706 followed by a parenthetical reference. The first column, or Valid
707 Code Point, may have only one Code Point specified in a given row.
708 The other columns may have more than one.
709
710
711
712 Konishi, et al. Informational [Page 13]
713 RFC 3743 JET Guidelines for IDN April 2004
714
715
716 Any row may be terminated with an optional comment, starting with
717 "#".
718
719 The formal syntax of the table and more-precise definitions of some
720 of its organization appear in Section 5.
721
722 The Language Variant Table should be provided by a relevant group,
723 organization, or body. However, the question of who is relevant or
724 has the authority to create this table and the rules that define it
725 is beyond the scope of this document.
726
727 3.2.2. IDL Package
728
729 The IDL Package is created on successful registration and consists
730 of:
731
732 (1) the IDL registered
733
734 (2) the language(s) associated with the IDL
735
736 (3) the version of the associated character variant table
737
738 (4) the reserved IDLs
739
740 (5) active IDLs, that is, "Zone Variant Labels" that are to appear
741 in the DNS zone file
742
743 3.2.3. Procedure for Registering IDLs
744
745 An explanation follows each step.
746
747 Step 1. IN <= IDL to be registered and
748 {L} <= Set of languages associated with IN
749
750 Start the process with the label string (prospective IDL) to be
751 registered and the associated language(s) as input.
752
753 Step 2. Generate the Nameprep-processed version of the IN,
754 applying all mappings and canonicalization required by
755 IDNA.
756
757 The prospective IDL is processed by using Nameprep to apply the
758 normalizations and exclusions globally required to use IDNA. If the
759 Nameprep processing fails, then the IDL is invalid and the
760 registration process must stop.
761
762
763
764
765
766
767 Konishi, et al. Informational [Page 14]
768 RFC 3743 JET Guidelines for IDN April 2004
769
770
771 Step 2.1. NP(IN) <= Nameprep processed IN
772 Step 2.2. Check availability of NP(IN). If not available, route to
773 conflict policy.
774
775 The Nameprep-processed IDL is then checked against the contents of
776 the zone file and previously created IDL Packages. If it is already
777 registered or reserved, then a conflict exists that must be resolved
778 by applying whatever policy is applicable for the zone. For example,
779 if FCFS is used, the registration process terminates unless the
780 conflict resolution policy provides another alternative.
781
782 Step 3. Process each language.
783 For each language (AL) in {L}
784
785 Step 3 goes through all languages associated with the proposed IDL
786 and checks each character (after Nameprep has been applied) for
787 validity in each of them. It then applies the Preferred Variants
788 (column 2 values) and the Character Variants (column 3 values) to
789 generate candidate labels.
790
791 Step 3.1. Check validity of NP(IN) in AL. If failed, stop
792 processing.
793
794 In step 3.1, IDL validation is done by checking that every Code Point
795 in the Nameprep-processed IDL is a Code Point allowed by the "Valid
796 Code Point" column of the Character Variant Table for the language.
797 This is then repeated for any other languages (and hence, Language
798 Variant Tables) specified in the registration. If one or more Code
799 Points are not valid, the registration process terminates.
800
801 Step 3.2. PV(IN,AL) <= Set of available Nameprep-processed Preferred
802 Variants of NP(IN) in AL
803
804 Step 3.2 generates the list of Preferred Variant Labels of the IDL by
805 doing a combination (see Step 3.2A below) of all possible variants
806 listed in the "Preferred Variant(s)" column for each Code Point in
807 the Nameprep-processed IDL. The generated Preferred Variant Labels
808 must be processed through Nameprep. If the Nameprep processing fails
809 for any Preferred Variant Label (this is unlikely to occur if the
810 Preferred Variants are processed through Nameprep before being placed
811 in the table), then that variant label will be removed from the list.
812 The remaining Preferred Variant Labels in the list are then checked
813 to see whether they are already registered or reserved. If any are
814 registered or reserved, then the conflict resolution policy will
815 apply. In general, this will not prevent the originally requested
816 IDL from being registered unless the policy prevents such
817 registration. For example, if FCFS is applied, then the conflicting
818 variants will be removed from the list, but the originally requested
819
820
821
822 Konishi, et al. Informational [Page 15]
823 RFC 3743 JET Guidelines for IDN April 2004
824
825
826 IDL and any remaining variants will be registered (see steps 5 and 8
827 below).
828
829 Step 3.2A Generating variant labels from Variant Code Points.
830
831 Steps 3.2 and 3.3 require that the Preferred Variants and Character
832 Variants be combined with the original IDL to form sets of variant
833 labels. Conceptually, one starts with the original, Nameprep-
834 processed, IDL and examines each of its characters in turn. If a
835 character is encountered for which there is a corresponding Preferred
836 Variant or Character Variant, a new variant label is produced with
837 the Variant Code Point substituted for the original one. If variant
838 labels already exist as the result of the processing of characters
839 that appeared earlier in the original IDL, then the substitutions are
840 made in them as well, resulting in additional generated variant
841 labels. This operation is repeated separately for the Preferred
842 Variants (in Step 3.2) and Character Variants (in Step 3.3). Of
843 course, equivalent results could be achieved by processing the
844 original IDL's characters in order, building the Preferred Variant
845 Label set and Character Variant Label set in parallel.
846
847 This process will sometimes generate a very large number of labels.
848 For example, if only two of the characters in the original IDL are
849 associated with Preferred Variants and if the first of those
850 characters has three Preferred Variants and the second has two, one
851 ends up with 12 variant labels to be placed in the IDL Package and,
852 normally, in the zone file. Repeating the process for Character
853 Variants, if any exist, would further increase the number of labels.
854 And if more than one language is specified for the original IDL, then
855 repetition of the process for additional languages (see step 4,
856 below) might further increase the size of the set.
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877 Konishi, et al. Informational [Page 16]
878 RFC 3743 JET Guidelines for IDN April 2004
879
880
881 For illustrative purposes, the "combination" process could be
882 achieved by a recursive function similar to the following pseudocode:
883
884 Function Combination(Str)
885 F <= first codepoint of Str
886 SStr <= Substring of Str, without the first code point
887 NSC <= {}
888
889 If SStr is empty then
890 for each V in (Variants of code point F)
891 NSC = NSC set-union (the string with the code point V)
892 End of Loop
893 Else
894 SubCom = Combination(SStr)
895 For each V in (Variants of code point F)
896 For each SC in SubCom
897 NSC = NSC set-union (the string with the
898 first code point V followed by the string SC)
899 End of Loop
900 End of Loop
901 Endif
902
903 Return NSC
904
905 Step 3.3. CV(IN,AL) <= Set of available Nameprep-processed Character
906 Variants of NP(IN) in AL
907
908 This step generates the list of Character Variant Labels by doing a
909 combination (see Step 3.2A above) of all the possible variants listed
910 in the "Character Variant(s)" column for each Code Point in the
911 Nameprep-processed original IDL. As with the Preferred Variant
912 Labels, the generated Character Variant Labels must be processed by,
913 and acceptable to, Nameprep. If the Nameprep processing fails for a
914 Character Variant Label, then that variant label will be removed from
915 the list. The remaining Character Variant Labels are then checked to
916 be sure they are not registered or reserved. If one or more are,
917 then the conflict resolution policy is applied. As with Preferred
918 Variant Labels, a conflict that is resolved in favor of the earlier
919 registrant does not, in general, prevent the IDL from being
920 registered, nor the remaining variants from being reserved in step 6
921 below.
922
923 Step 3.4. End of Loop
924
925
926
927
928
929
930
931
932 Konishi, et al. Informational [Page 17]
933 RFC 3743 JET Guidelines for IDN April 2004
934
935
936 Step 4. Let PVall be the set-union of all PV(IN,AL)
937
938 Step 4 generates the Preferred Variants Label for all languages. In
939 this step, and again in step 6 below, the zone administrator may
940 impose additional rules and processing activities to restrict the
941 number of Preferred (tentatively to be reserved and activated) and
942 Character (tentatively to be reserved) Label Variants. These
943 additional rules and processing activities are zone policy specific
944 and therefore are not specified in this document.
945
946 Step 5. {ZV} <= PVall set-union NP(IN)
947
948 Step 5 generates the initial Zone Variants. The set includes all
949 Preferred Variants for all languages and the original Nameprep-
950 processed IDL. Unless excluded by further processing, these Zone
951 Variants will be activated. That is, placed into the DNS zone. Note
952 that the "set-union" operation will eliminate any duplicates.
953
954 Step 6. Let CVall be the set-union of all CV(IN,AL), set-minus
955 {ZV}
956
957 Step 6 generates the Reserved Label Variants (the Character Variant
958 Label set). These labels are normally reserved but not activated.
959 The set includes all Character Variant Labels for all languages, but
960 not the Zone Variants defined in the previous step. The set-union
961 and set-minus operations eliminate any duplicates.
962
963 Step 7. Create IDL Package for IN using IN, {L}, {ZV} and CVall
964
965 In Step 7, the "IDL Package" is created using the original IDL, the
966 associated language(s), the Zone Variant Labels, and the Reserved
967 Variant Labels. If zone-specific additional processing or filtering
968 is to be applied to eliminate linguistically inappropriate or other
969 forms, it should be applied before the IDL Package is actually
970 assembled.
971
972 Step 8. Put {ZV} into zone file
973
974 The activated IDLs are converted via ToASCII with UseSTD13ASCIIRules
975 [IDNA] before being placed into the zone file. This conversion
976 results in the IDLs being in the actual IDNA ("Punycode") form used
977 in zone files, while the IDLs have been carried in Unicode form up to
978 this point. If ToASCII fails for any of the activated IDLs, that IDL
979 must not be placed into the zone file. If the IDL is a subdomain
980 name, it will be delegated.
981
982
983
984
985
986
987 Konishi, et al. Informational [Page 18]
988 RFC 3743 JET Guidelines for IDN April 2004
989
990
991 3.3. Deletion and Transfer of IDL and IDL Package
992
993 In traditional domain administration, every Domain Name Label is
994 independent of all other Domain Name Labels. Registration, deletion,
995 and transfer of labels is done on a per-label basis. However, with
996 the guidelines discussed here, each IDL is associated with specific
997 languages, with all label variants, both active (zone) and reserved,
998 together in an IDL Package. This quite deliberately prohibits labels
999 that contain sufficient mixtures of characters from different scripts
1000 to make them impossible as words in any given language. If a zone
1001 chooses to not impose that restriction--that is, to permit labels to
1002 be constructed by picking characters from several different languages
1003 and scripts--then the guidelines described here would be
1004 inappropriate.
1005
1006 As stated earlier, the IDL package should be treated as a single
1007 atomic unit and all variants of the IDL should belong to a single
1008 domain-name holder. If the local policy related to the handling of
1009 disagreements requires a particular IDL to be transferred and deleted
1010 independently of the IDL Package, the conflict policy would take
1011 precedence. In such an event, the conflict policy should include a
1012 transfer or delete procedure that takes the nature of IDL Packages
1013 into consideration.
1014
1015 When an IDL Package is deleted, all of the Zone and Reserved Label
1016 Variants again become available. The deletion of one IDL Package
1017 does not change any other IDL Packages.
1018
1019 3.4. Activation and Deactivation of IDL variants
1020
1021 Because there are active (registered) IDLs and inactive (reserved but
1022 not registered) IDLs within an IDL package, processes are required to
1023 activate or deactivate IDL variants within an IDL Package.
1024
1025 3.4.1. Activation Algorithm
1026
1027 Step 1. IN <= IDL to be activated and PA <= IDL Package
1028
1029 Start with the IDL to be activated and the IDL Package of which it is
1030 a member.
1031
1032 Step 2. NP(IN) <= Nameprep processed IN
1033
1034 Process the IDL through Nameprep. This step should never cause a
1035 problem, or even a change, since all labels that become part of the
1036 IDL Package are processed through Nameprep in Step 3.2 or 3.3 of the
1037 Registration procedure (section 3.2.3).
1038
1039
1040
1041
1042 Konishi, et al. Informational [Page 19]
1043 RFC 3743 JET Guidelines for IDN April 2004
1044
1045
1046 Step 3. If NP(IN) not in CVall then stop
1047
1048 Verify that the Nameprep-processed version of the IDL appears as a
1049 still-unactivated label in the IDL Package, i.e., in the list of
1050 Reserved Label Variants, CVall. It might be a useful "sanity check"
1051 to also verify that it does not already appear in the zone file.
1052
1053 Step 4. CVall <= CVall set-minus NP(IN) and {ZV} <= {ZV} set-union
1054 NP(IN)
1055
1056 Within the IDL Package, remove the Nameprep-processed version of the
1057 IDL from the list of Reserved Label Variants and add it to the list
1058 of active (zone) label variants.
1059
1060 Step 5. Put {ZV} into the zone file
1061
1062 Actually register (activate) the Zone Variant Labels.
1063
1064 3.4.2. Deactivation Algorithm
1065
1066 Step 1. IN <= IDL to be deactivated and PA <= IDL Package
1067
1068 As with activation, start with the IDL to be deactivated and the IDL
1069 Package of which it is a member.
1070
1071 Step 2. NP(IN) <= Nameprep processed IN
1072
1073 Get the Nameprep-processed version of the name (see discussion in the
1074 previous section).
1075
1076 Step 3. If NP(IN) not in {ZV} then stop
1077
1078 Verify that the Nameprep-processed version of the IDL appears as an
1079 activated (zone) label variant in the IDL Package. It might be a
1080 useful "sanity check" at this point to also verify that it actually
1081 appears in the zone file.
1082
1083 Step 4. CVall <= CVall set-union NP(IN) and {ZV} <= {ZV} set-minus
1084 NP(IN)
1085
1086 Within the IDL Package, remove the Nameprep-processed version of the
1087 IDL from the list of Active (Zone) Label Variants and add it to the
1088 list of Reserved (but inactive) Label Variants.
1089
1090 Step 5. Put {ZV} into the zone file
1091
1092
1093
1094
1095
1096
1097 Konishi, et al. Informational [Page 20]
1098 RFC 3743 JET Guidelines for IDN April 2004
1099
1100
1101 3.5. Managing Changes in Language Associations
1102
1103 Since the IDL package is an atomic unit and the associated list of
1104 variants must not be changed after creation, this document does not
1105 include a mechanism for adding and deleting language associations
1106 within the IDL package. Instead, it recommends deleting the IDL
1107 package entirely, followed by a registration with the new set of
1108 languages. Zone administrators may find it desirable to devise
1109 procedures that prevent other parties from capturing the labels in
1110 the IDL Package during these operations.
1111
1112 3.6. Managing Changes to the Language Variant Tables
1113
1114 Language Variant Tables are subject to changes over time, and these
1115 changes may or may not be backward compatible. It is possible that
1116 updated Language Variant Tables may produce a different set of
1117 Preferred Variants and Reserved Variants.
1118
1119 In order to preserve the atomicity of the IDL Package, when the
1120 Language Variant Table is changed, IDL Packages created using the
1121 previous version of the Language Variant Table must not be updated or
1122 affected.
1123
1124 4. Examples of Guideline Use in Zones
1125
1126 To provide a meaningful example, some Language Variant Tables must be
1127 defined. Assume, then, for the purpose of giving examples, that the
1128 following four Language Variant Tables are defined:
1129
1130 Note: these tables are not a representation of the actual tables, and
1131 they do not contain sufficient entries to be used in any actual
1132 implementation. IANA maintains a voluntary registry of actual tables
1133 [IANA-LVTABLES] which may be consulted for complete examples.
1134
1135 a) Language Variant Table for zh-cn and zh-sg
1136
1137 Reference 1 CP936 (commonly known as GBK)
1138 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt [UNIHAN]
1139 Reference 3 List of Simplified character Table (Simplified column)
1140 Reference 4 zSimpVariant in Unihan.txt [UNIHAN]
1141 Reference 5 variant that exists in GB2312, common simplified hanzi
1142
1143 Version 1 20020701 # July 2002
1144
1145 56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump
1146 5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump
1147 60F3(1);60F3(5); # think, speculate, plan, consider
1148 654E(1);6559(5);6559(2) # teach
1149
1150
1151
1152 Konishi, et al. Informational [Page 21]
1153 RFC 3743 JET Guidelines for IDN April 2004
1154
1155
1156 6559(1);6559(5);654E(2) # teach, class
1157 6DF8(1);6E05(5);6E05(2) # clear
1158 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful
1159 771E(1);771F(5);771F(2) # real, actual, true, genuine
1160 771F(1);771F(5);771E(2) # real, actual, true, genuine
1161 8054(1);8054(3);806F(2) # connect, join; associate, ally
1162 806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally
1163 96C6(1);96C6(5); # assemble, collect together
1164
1165 b) Language Variant Table for zh-tw
1166
1167 Reference 1 CP950 (commonly known as BIG5)
1168 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
1169 Reference 3 List of Simplified Character Table (Traditional column)
1170 Reference 4 zTradVariant in Unihan.txt
1171
1172 Version 1 20020701 # July 2002
1173
1174 5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump
1175 60F3(1);60F3(1); # think, speculate, plan, consider
1176 6559(1);6559(1);654E(2) # teach, class
1177 6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful
1178 771F(1);771F(1);771E(2) # real, actual, true, genuine
1179 806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally
1180 96C6(1);96C6(1); # assemble, collect together
1181
1182 c) Language Variant Table for ja
1183
1184 Reference 1 CP932 (commonly known as Shift-JIS)
1185 Reference 2 zVariant in Unihan.txt
1186 Reference 3 variant that exists in JIS X0208, commonly used Kanji
1187
1188 Version 1 20020701 # July 2002
1189
1190 5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump
1191 60F3(1);60F3(3); # think, speculate, plan, consider
1192 654E(1);6559(3);6559(2) # teach
1193 6559(1);6559(3);654E(2) # teach, class
1194 6DF8(1);6E05(3);6E05(2) # clear
1195 6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful
1196 771E(1);771E(1);771F(2) # real, actual, true, genuine
1197 771F(1);771F(1);771E(2) # real, actual, true, genuine
1198 806F(1);806F(1);8068(2) # connect, join; associate, ally
1199 96C6(1);96C6(3); # assemble, collect together
1200
1201 d) Language Variant Table for ko
1202
1203 Reference 1 CP949 (commonly known as EUC-KR)
1204
1205
1206
1207 Konishi, et al. Informational [Page 22]
1208 RFC 3743 JET Guidelines for IDN April 2004
1209
1210
1211 Reference 2 zVariant and K-source in Unihan.txt
1212
1213 Version 1 20020701 # July 2002
1214
1215 5718(1);5718(1);56E3(2) # sphere, ball, circle; mass, lump
1216 60F3(1);60F3(1); # think, speculate, plan, consider
1217 654E(1);654E(1);6559(2) # teach
1218 6DF8(1);6DF8(1);6E05(2) # clear
1219 771E(1);771E(1);771F(2) # real, actual, true, genuine
1220 806F(1);806F(1);8068(2) # connect, join; associate, ally
1221 96C6(1);96C6(1); # assemble, collect together
1222
1223 Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
1224 {L} = {zh-cn, zh-sg, zh-tw}
1225
1226 NP(IN) = (U+6E05 U+771F U+6559)
1227 PV(IN,zh-cn) = (U+6E05 U+771F U+6559)
1228 PV(IN,zh-sg) = (U+6E05 U+771F U+6559)
1229 PV(IN,zh-tw) = (U+6E05 U+771F U+6559)
1230
1231 {ZV} = {(U+6E05 U+771F U+6559)}
1232 CVall = {(U+6E05 U+771E U+6559),
1233 (U+6E05 U+771E U+654E),
1234 (U+6E05 U+771F U+654E),
1235 (U+6DF8 U+771E U+6559),
1236 (U+6DF8 U+771E U+654E),
1237 (U+6DF8 U+771F U+6559),
1238 (U+6DF8 U+771F U+654E)}
1239
1240 Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
1241 {L} = {ja}
1242
1243 NP(IN) = (U+6E05 U+771F U+6559)
1244 PV(IN,ja) = (U+6E05 U+771F U+6559)
1245 {ZV} = {(U+6E05 U+771F U+6559)}
1246
1247 CVall = {(U+6E05 U+771E U+6559),
1248 (U+6E05 U+771E U+654E),
1249 (U+6E05 U+771F U+654E),
1250 (U+6DF8 U+771E U+6559),
1251 (U+6DF8 U+771E U+654E),
1252 (U+6DF8 U+771F U+6559),
1253 (U+6DF8 U+771F U+654E)}
1254
1255 Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
1256 {L} = {zh-cn, zh-sg, zh-tw, ja, ko}
1257
1258 NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
1259
1260
1261
1262 Konishi, et al. Informational [Page 23]
1263 RFC 3743 JET Guidelines for IDN April 2004
1264
1265
1266 Invalid registration because U+6E05 is invalid in L = ko
1267
1268 Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718)
1269 *lian2 xiang3 ji2 tuan2*
1270 {L} = {zh-cn, zh-sg, zh-tw}
1271
1272 NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
1273 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
1274 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
1275 PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718)
1276 {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2),
1277 (U+806F U+60F3 U+96C6 U+5718)}
1278 CVall = {(U+8054 U+60F3 U+96C6 U+56E3),
1279 (U+8054 U+60F3 U+96C6 U+5718),
1280 (U+806F U+60F3 U+96C6 U+56E2),
1281 (U+806f U+60F3 U+96C6 U+56E3),
1282 (U+8068 U+60F3 U+96C6 U+56E2),
1283 (U+8068 U+60F3 U+96C6 U+56E3),
1284 (U+8068 U+60F3 U+96C6 U+5718)
1285
1286 Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
1287 *lian2 xiang3 ji2 tuan2*
1288 {L} = {zh-cn, zh-sg}
1289
1290 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
1291 PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
1292 PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
1293 {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2)}
1294 CVall = {(U+8054 U+60F3 U+96C6 U+56E3),
1295 (U+8054 U+60F3 U+96C6 U+5718),
1296 (U+806F U+60F3 U+96C6 U+56E2),
1297 (U+806f U+60F3 U+96C6 U+56E3),
1298 (U+806F U+60F3 U+96C6 U+5718),
1299 (U+8068 U+60F3 U+96C6 U+56E2),
1300 (U+8068 U+60F3 U+96C6 U+56E3),
1301 (U+8068 U+60F3 U+96C6 U+5718)}
1302
1303 Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
1304 *lian2 xiang3 ji2 tuan2*
1305 {L} = {zh-cn, zh-sg, zh-tw}
1306
1307 NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
1308 Invalid registration because U+8054 is invalid in L = zh-tw
1309
1310 Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718)
1311 *lian2 xiang3 ji2 tuan2*
1312 {L} = {ja,ko}
1313
1314
1315
1316
1317 Konishi, et al. Informational [Page 24]
1318 RFC 3743 JET Guidelines for IDN April 2004
1319
1320
1321 NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
1322 PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718)
1323 PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718)
1324 {ZV} = {(U+806F U+60F3 U+96C6 U+5718)}
1325
1326 CVall = {(U+806F U+60F3 U+96C6 U+56E3),
1327 (U+8068 U+60F3 U+96C6 U+5718),
1328 (U+8068 U+60F3 U+96C6 U+56E3)}
1329
1330 5. Syntax Description for the Language Variant Table
1331
1332 The formal syntax for the Language Variant Table is as follows, using
1333 the IETF "ABNF" metalanguage [ABNF]. Some comments on this syntax
1334 appear immediately after it.
1335
1336 5.1. ABNF Syntax
1337
1338 LanguageVariantTable = 1*ReferenceLine VersionLine 1*EntryLine
1339 ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF
1340 RefNo = 1*DIGIT
1341 RefDesciption = *[VCHAR]
1342 VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF
1343 VersionNo = 1*DIGIT
1344 VersionDate = YYYYMMDD
1345 EntryLine = VariantEntry/Comment CRLF
1346
1347 VariantEntry = ValidCodePoint ";"
1348 PreferredVariant ";" CharacterVariant [ Comment ]
1349 ValidCodePoint = CodePoint
1350 RefList = RefNo 0*( "," RefNo )
1351 PreferredVariant = CodePointSet 0*( "," CodePointSet )
1352 CharacterVariant = CodePointSet 0*( "," CodePointSet )
1353 CodePointSet = CodePoint 0*( SP CodePoint )
The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.
This RFC is included in the DNS RFCs annotation project whose home page is here.
1354 CodePoint = 4*8DIGIT [ "(" Reflist ")" ]
1355 Comment = "#" *VCHAR
1356
1357 YYYYMMDD is an integer, in alphabetic form, representing a date,
1358 where YYYY is the 4-digit year, MM is the 2-digit month, and DD is
1359 the 2-digit day.
1360
1361 5.2. Comments and Explanation of Syntax
1362
1363 Any lines starting with, or portions of lines after, the hash
1364 symbol("#") are treated as comments. Comments have no significance
1365 in the processing of the tables; nor are there any syntax
1366 requirements between the hash symbol and the end of the line. Blank
1367 lines in the tables are ignored completely.
1368
1369
1370
1371
1372 Konishi, et al. Informational [Page 25]
1373 RFC 3743 JET Guidelines for IDN April 2004
1374
1375
1376 Every language should have its own Language Variant Table provided by
1377 a relevant group, organization, or other body. That table will
1378 normally be based on some established standard or standards. The
1379 group that defines a Language Variant Table should document
1380 references to the appropriate standards at the beginning of the
1381 table, tagged with the word "Reference" followed by an integer (the
1382 reference number) followed by the description of the reference. For
1383 example:
1384
1385 Reference 1 CP936 (commonly known as GBK)
1386 Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
1387 Reference 3 List of Simplified Character Table (Simplified column)
1388 Reference 4 zSimpVariant in Unihan.txt
1389 Reference 5 Variant that exists in GB2312, common simplified Hanzi
1390
1391 Each Language Variant Table must have a version number and its
1392 release date. This is tagged with the word "Version" followed by an
1393 integer then followed by the date in the format YYYYMMDD, where YYYY
1394 is the 4-digit year, MM is the 2-digit month, and DD is the 2-digit
1395 day of the publication date of the table.
1396
1397 Version 1 20020701 # July 2002 Version 1
1398
1399 The table has three columns, separated by semicolons: "Valid Code
1400 Point"; "Preferred Variant(s)"; and "Character Variant(s)".
1401
1402 The "Valid Code Point" is the subset of Unicode characters that are
1403 valid to be registered.
1404
1405 There can be more than one Preferred Variant; hence there could be
1406 multiple entries in the "Preferred Variant(s)" column. If the
1407 "Preferred Variant(s)" column is empty, then there is no
1408 corresponding Preferred Variant; in other words, the Preferred
1409 Variant is null, there is no corresponding preferred variant
1410 codepoint, and no processing to add labels for preferred variants
1411 occurs." Unless local policy dictates otherwise, the procedures
1412 above will result in only those labels that reflect the valid code
1413 point being activated (registered) into the zone file.
1414
1415 The "Character Variant(s)" column contains all Character Variants of
1416 the Code Point. Since the Code Point is always a variant of itself,
1417 to avoid redundancy, the Code Point is assumed to be part of the
1418 "Character Variant(s)" and need not be repeated in the "Character
1419 Variant(s)" column.
1420
1421 If the variant in the "Preferred Variant(s)" or the "Character
1422 Variant(s)" column is composed of a sequence of Code Points, then
1423 sequence of Code Points is listed separated by a space.
1424
1425
1426
1427 Konishi, et al. Informational [Page 26]
1428 RFC 3743 JET Guidelines for IDN April 2004
1429
1430
1431 If there are multiple variants in the "Preferred Variant(s)" or the
1432 "Character Variant(s)" column, then each variant is separated by a
1433 comma.
1434
1435 Any Code Point listed in the "Preferred Variant(s)" column must be
1436 allowed by the rules for the relevant language to be registered.
1437 However, this is not a requirement for the entries in the "Character
1438 Variant(s)" column; it is possible that some of those entries may not
1439 be allowed to be registered.
1440
1441 Every Code Point in the table should have a corresponding reference
1442 number (associated with the references) specified to justify the
1443 entry. The reference number is placed in parentheses after the Code
1444 Point. If there is more than one reference, then the numbers are
1445 placed within a single set of parentheses and separated by commas.
1446
1447 6. Security Considerations
1448
1449 As discussed in the Introduction, substantially-unrestricted use of
1450 international (non-ASCII) characters in domain name labels may cause
1451 user confusion and invite various types of attacks. In particular,
1452 in the case of CJK languages, an attacker has an opportunity to
1453 divert or confuse users as a result of different characters (or, more
1454 specifically, assigned code points) with identical or similar
1455 semantics. These Guidelines provide a partial remedy for those risks
1456 by supplying a framework for prohibiting inappropriate characters
1457 from being registered at all and for permitting "variant" characters
1458 to be grouped together and reserved, so that they can only be
1459 registered in the DNS by the same owner. However, the system it
1460 suggests is no better or worse than the per-zone and per-language
1461 tables whose format and use this document specifies. Specific
1462 tables, and any additional local processing, will reflect per-zone
1463 decisions about the balance between risk and flexibility of
1464 registrations. And, of course, errors in construction of those
1465 tables may significantly reduce the quality of protection provided.
1466
1467 7. Index to Terminology
1468
1469 As a convenience to the reader, this section lists all of the special
1470 terminology used in this document, with a pointer to the section in
1471 which it is defined.
1472
1473 Activated Label 2.1.17
1474 Activation 2.1.4
1475 Active Label 2.1.17
1476 Character Variant 2.1.14
1477 Character Variant Label 2.1.16
1478 CJK Characters 2.1.9
1479
1480
1481
1482 Konishi, et al. Informational [Page 27]
1483 RFC 3743 JET Guidelines for IDN April 2004
1484
1485
1486 Code point 2.1.7
1487 Code Point Variant 2.1.14
1488 FQDN 2.1.3
1489 Hostname 2.1.1
1490 IDL 2.1.2
1491 IDL Package 2.1.18
1492 IDN 2.1.1
1493 Internationalized Domain Label 2.1.2
1494 ISO/IEC 10646 2.1.6
1495 Label String 2.1.10
1496 Language name codes 2.1.5
1497 Language Variant Table 2.1.11
1498 LDH Subset 2.1.1
1499 Preferred Code Point 2.1.13
1500 Preferred Variant 2.1.13
1501 Preferred Variant Label 2.1.15
1502 Registration 2.1.4
1503 Reserved 2.1.18
1504 RFC3066 2.1.5
1505 Table 2.1.11
1506 UCS 2.1.6
1507 Unicode Character 2.1.7
1508 Unicode String 2.1.8
1509 Valid Code Point 2.1.12
1510 Variant Table 2.1.11
1511 Zone Variant 2.1.17
1512
1513 8. Acknowledgments
1514
1515 The authors gratefully acknowledge the contributions of:
1516
1517 - V. CHEN, N. HSU, H. HOTTA, S. TASHIRO, Y. YONEYA, and other Joint
1518 Engineering Team members at the JET meeting in Bangkok, Thailand.
1519
1520 - Yves Arrouye, an observer at the JET meeting in Bangkok, for his
1521 contribution on the IDL Package.
1522
1523 - Those who commented on, and made suggestions about, earlier
1524 versions, including Harald ALVESTRAND, Erin CHEN, Patrik
1525 FALTSTROM, Paul HOFFMAN, Soobok LEE, LEE Xiaodong, MAO Wei, Erik
1526 NORDMARK, and L.M. TSENG.
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537 Konishi, et al. Informational [Page 28]
1538 RFC 3743 JET Guidelines for IDN April 2004
1539
1540
1541 9. References
1542
1543 9.1. Normative References
1544
1545 [ABNF] Crocker, D. and P. Overell, Eds., "Augmented BNF for
1546 Syntax Specifications: ABNF", RFC 2234, November
1547 1997.
1548
1549 [STD13] Mockapetris, P., "Domain names concepts and
1550 facilities" STD 13, RFC 1034, November 1987.
1551 Mockapetris, P., "Domain names implementation and
1552 specification", STD 13, RFC 1035, November 1987.
1553
1554 [RFC3066] Alvestrand, H., "Tags for the Identification of
1555 Languages," BCP 47, RFC 3066, January 2001.
1556
1557 [IDNA] Faltstrom, P., Hoffman, P. and A. M. Costello,
1558 "Internationalizing Domain Names in Applications
1559 (IDNA)", RFC 3490, March 2003.
1560
1561 [PUNYCODE] Costello, A.M., "Punycode: A Bootstring encoding of
1562 Unicode for Internationalized Domain Names in
1563 Applications (IDNA)", RFC 3492, March 2003.
1564
1565 [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
1566 Internationalized Strings ("stringprep")", RFC 3454,
1567 December 2002.
1568
1569 [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
1570 Profile for Internationalized Domain Names (IDN)",
1571 RFC 3491, March 2003.
1572
1573 [IS10646] A product of ISO/IEC JTC1/SC2/WG2, Work Item
1574 JTC1.02.18 (ISO/IEC 10646). It is a multipart
1575 standard: Part 1, published as ISO/IEC 10646-
1576 1:2000(E), covers the Architecture and Basic
1577 Multilingual Plane, and Part 2, published as ISO/IEC
1578 10646-2:2001(E), covers the supplementary
1579 (additional) planes.
1580
1581 [UNIHAN] Unicode Han Database, Unicode Consortium
1582 ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt.
1583
1584 [UNICODE] The Unicode Consortium, "The Unicode Standard Version
1585 3.0," ISBN 0-201-61633-5. Unicode Standard Annex #28
1586 (http://www.unicode.org/unicode/reports/tr28/)
1587 defines Version 3.2 of the Unicode Standard, which is
1588 definitive for IDNA and this document.
1589
1590
1591
1592 Konishi, et al. Informational [Page 29]
1593 RFC 3743 JET Guidelines for IDN April 2004
1594
1595
1596 [ISO7098] ISO 7098;1991 Information and documentation
1597 Romanization of Chinese, ISO/TC46/SC2.
1598
1599 9.2. Informative References
1600
1601 [IANA-LVTABLES] Internet Assigned Numbers Authority (IANA), IDN
1602 Character Registry.
1603 http://www.iana.org/assignments/idn/
1604
1605 [IDN-WG] IETF Internationalized Domain Names Working Group,
1606 now concluded,idn@ops.ietf.org, James Seng, Marc
1607 Blanchet, co-chairs, http://www.i-d-n.net/.
1608
1609 [UDRP] ICANN, "Uniform Domain Name Dispute Resolution
1610 Policy", October 1999,
1611 http://www.icann.org/udrp/udrp-policy-24oct99.htm
1612
1613 [ISO639] "ISO 639:1988 (E/F) Code for the representation of names
1614 of languages", International Organization for
1615 Standardization, 1st edition, 1988-04-01.
1616
1617 10. Contributors
1618
1619 The formal responsibility for this document and the ideas it contains
1620 lie with K. Koniski, K. Huang, H. Qian, and Y. Ko. These authors are
1621 listed on the first page as authors of record, and they are the
1622 appropriate the long-term contacts for questions and comments on this
1623 RFC. On the other hand, J. Seng, J. Klensin, and W. Rickard served
1624 as editors of the document, transcribing and translating the ideas of
1625 the four authors and the teams they represented into the current
1626 written form. They were the primary contacts during the editing
1627 process, but not in the long term.
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647 Konishi, et al. Informational [Page 30]
1648 RFC 3743 JET Guidelines for IDN April 2004
1649
1650
1651 10.1. Authors' Addresses
1652
1653 Kazunori KONISHI
1654 JPNIC
1655 Kokusai-Kougyou-Kanda Bldg 6F
1656 2-3-4 Uchi-Kanda, Chiyoda-ku
1657 Tokyo 101-0047
1658 Japan
1659
1660 Phone: +81 49-278-7313
1661 EMail: konishi@jp.apan.net
1662
1663
1664 Kenny HUANG
1665 TWNIC
1666 3F, 16, Kang Hwa Street, Taipei
1667 Taiwan
1668
1669 Phone: 886-2-2658-6510
1670 EMail: huangk@alum.sinica.edu
1671
1672
1673 QIAN Hualin
1674 CNNIC
1675 No.6 Branch-box of No.349 Mailbox, Beijing 100080
1676 Peoples Republic of China
1677
1678 EMail: Hlqian@cnnic.net.cn
1679
1680
1681 KO YangWoo
1682 PeaceNet
1683 Yangchun P.O. Box 81 Seoul 158-600
1684 Korea
1685
1686 EMail: yw@mrko.pe.kr
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702 Konishi, et al. Informational [Page 31]
1703 RFC 3743 JET Guidelines for IDN April 2004
1704
1705
1706 10.2. Editors' Addresses
1707
1708 James SENG
1709 180 Lompang Road
1710 #22-07 Singapore 670180
1711 Phone: +65 9638-7085
1712
1713 EMail: jseng@pobox.org.sg
1714
1715
1716 John C KLENSIN
1717 1770 Massachusetts Avenue, No. 322
1718 Cambridge, MA 02140
1719 U.S.A.
1720
1721 EMail: Klensin+ietf@jck.com
1722
1723
1724 Wendy RICKARD
1725 The Rickard Group
1726 16 Seminary Ave
1727 Hopewell, NJ 08525
1728 USA
1729
1730 EMail: rickard@rickardgroup.com
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757 Konishi, et al. Informational [Page 32]
1758 RFC 3743 JET Guidelines for IDN April 2004
1759
1760
1761 11. Full Copyright Statement
1762
1763 Copyright (C) The Internet Society (2004). This document is subject
1764 to the rights, licenses and restrictions contained in BCP 78 and
1765 except as set forth therein, the authors retain all their rights.
1766
1767 This document and the information contained herein are provided on an
1768 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1769 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1770 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1771 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1772 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1773 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
1774
1775 Intellectual Property
1776
1777 The IETF takes no position regarding the validity or scope of any
1778 Intellectual Property Rights or other rights that might be claimed to
1779 pertain to the implementation or use of the technology described in
1780 this document or the extent to which any license under such rights
1781 might or might not be available; nor does it represent that it has
1782 made any independent effort to identify any such rights. Information
1783 on the procedures with respect to rights in RFC documents can be
1784 found in BCP 78 and BCP 79.
1785
1786 Copies of IPR disclosures made to the IETF Secretariat and any
1787 assurances of licenses to be made available, or the result of an
1788 attempt made to obtain a general license or permission for the use of
1789 such proprietary rights by implementers or users of this
1790 specification can be obtained from the IETF on-line IPR repository at
1791 http://www.ietf.org/ipr.
1792
1793 The IETF invites any interested party to bring to its attention any
1794 copyrights, patents or patent applications, or other proprietary
1795 rights that may cover technology that may be required to implement
1796 this standard. Please address the information to the IETF at ietf-
1797 ipr@ietf.org.
1798
1799 Acknowledgement
1800
1801 Funding for the RFC Editor function is currently provided by the
1802 Internet Society.
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812 Konishi, et al. Informational [Page 33]
1813
CodePoint = 4*8DIGIT [ "(" Reflist ")" ]
CodePoint = 4*8HEXDIGIT[ "(" Reflist ")" ]