1 Internet Engineering Task Force (IETF) J. Klensin
2 Request for Comments: 5894 August 2010
3 Category: Informational
4 ISSN: 2070-1721
5
6
7 Internationalized Domain Names for Applications (IDNA):
8 Background, Explanation, and Rationale
9
10 Abstract
11
12 Several years have passed since the original protocol for
13 Internationalized Domain Names (IDNs) was completed and deployed.
14 During that time, a number of issues have arisen, including the need
15 to update the system to deal with newer versions of Unicode. Some of
16 these issues require tuning of the existing protocols and the tables
17 on which they depend. This document provides an overview of a
18 revised system and provides explanatory material for its components.
19
20 Status of This Memo
21
22 This document is not an Internet Standards Track specification; it is
23 published for informational purposes.
24
25 This document is a product of the Internet Engineering Task Force
26 (IETF). It represents the consensus of the IETF community. It has
27 received public review and has been approved for publication by the
28 Internet Engineering Steering Group (IESG). Not all documents
29 approved by the IESG are a candidate for any level of Internet
30 Standard; see Section 2 of RFC 5741.
31
32 Information about the current status of this document, any errata,
33 and how to provide feedback on it may be obtained at
34 http://www.rfc-editor.org/info/rfc5894.
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52 Klensin Informational [Page 1]
53 RFC 5894 IDNA Rationale August 2010
54
55
56 Copyright Notice
57
58 Copyright (c) 2010 IETF Trust and the persons identified as the
59 document authors. All rights reserved.
60
61 This document is subject to BCP 78 and the IETF Trust's Legal
62 Provisions Relating to IETF Documents
63 (http://trustee.ietf.org/license-info) in effect on the date of
64 publication of this document. Please review these documents
65 carefully, as they describe your rights and restrictions with respect
66 to this document. Code Components extracted from this document must
67 include Simplified BSD License text as described in Section 4.e of
68 the Trust Legal Provisions and are provided without warranty as
69 described in the Simplified BSD License.
70
71 This document may contain material from IETF Documents or IETF
72 Contributions published or made publicly available before November
73 10, 2008. The person(s) controlling the copyright in some of this
74 material may not have granted the IETF Trust the right to allow
75 modifications of such material outside the IETF Standards Process.
76 Without obtaining an adequate license from the person(s) controlling
77 the copyright in such materials, this document may not be modified
78 outside the IETF Standards Process, and derivative works of it may
79 not be created outside the IETF Standards Process, except to format
80 it for publication as an RFC or to translate it into languages other
81 than English.
82
83 Table of Contents
84
85 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
86 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4
87 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
88 1.2.1. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5
89 1.2.2. New Terminology and Restrictions . . . . . . . . . . . 6
90 1.3. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6
91 1.4. Applicability and Function of IDNA . . . . . . . . . . . . 7
92 1.5. Comprehensibility of IDNA Mechanisms and Processing . . . 8
93 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9
94 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9
95 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10
96 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10
97 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11
98 3.1.2.1. Contextual Restrictions . . . . . . . . . . . . . 11
99 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12
100 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12
101 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13
102 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14
103
104
105
106
107 Klensin Informational [Page 2]
108 RFC 5894 IDNA Rationale August 2010
109
110
111 3.3. Layered Restrictions: Tables, Context, Registration, and
112 Applications . . . . . . . . . . . . . . . . . . . . . . . 15
113 4. Application-Related Issues . . . . . . . . . . . . . . . . . . 15
114 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15
115 4.2. Entry and Display in Applications . . . . . . . . . . . . 16
116 4.3. Linguistic Expectations: Ligatures, Digraphs, and
117 Alternate Character Forms . . . . . . . . . . . . . . . . 19
118 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20
119 4.5. Right-to-Left Text . . . . . . . . . . . . . . . . . . . . 21
120 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 22
121 6. Front-end and User Interface Processing for Lookup . . . . . . 22
122 7. Migration from IDNA2003 and Unicode Version Synchronization . 25
123 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 25
124 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 25
125 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26
126 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27
127 7.2. Changes in Character Interpretations . . . . . . . . . . . 28
128 7.2.1. Character Changes: Eszett and Final Sigma . . . . . . 28
129 7.2.2. Character Changes: Zero Width Joiner and Zero
130 Width Non-Joiner . . . . . . . . . . . . . . . . . . . 29
131 7.2.3. Character Changes and the Need for Transition . . . . 29
132 7.2.4. Transition Strategies . . . . . . . . . . . . . . . . 30
133 7.3. Elimination of Character Mapping . . . . . . . . . . . . . 31
134 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31
135 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 31
136 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32
137 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 32
138 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 33
139 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 33
140 7.7. Migration between Unicode Versions: Unassigned Code
141 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 35
142 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 36
143 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 37
144 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 37
145 8.2. Root and Other DNS Server Considerations . . . . . . . . . 37
146 9. Internationalization Considerations . . . . . . . . . . . . . 38
147 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38
148 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 38
149 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 39
150 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 39
151 11. Security Considerations . . . . . . . . . . . . . . . . . . . 39
152 11.1. General Security Issues with IDNA . . . . . . . . . . . . 39
153 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39
154 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 40
155 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40
156 14.1. Normative References . . . . . . . . . . . . . . . . . . . 40
157 14.2. Informative References . . . . . . . . . . . . . . . . . . 41
158
159
160
161
162 Klensin Informational [Page 3]
163 RFC 5894 IDNA Rationale August 2010
164
165
166 1. Introduction
167
168 1.1. Context and Overview
169
170 Internationalized Domain Names in Applications (IDNA) is a collection
171 of standards that allow client applications to convert some mnemonic
172 strings expressed in Unicode to an ASCII-compatible encoding form
173 ("ACE") that is a valid DNS label containing only LDH syntax (see the
174 Definitions document [RFC5890]). The specific form of ACE label used
175 by IDNA is called an "A-label". A client can look up an exact
176 A-label in the existing DNS, so A-labels do not require any
177 extensions to DNS, upgrades of DNS servers, or updates to low-level
178 client libraries. An A-label is recognizable from the prefix "xn--"
179 before the characters produced by the Punycode algorithm [RFC3492];
180 thus, a user application can identify an A-label and convert it into
181 Unicode (or some local coded character set) for display.
182
183 On the registry side, IDNA allows a registry to offer
184 Internationalized Domain Names (IDNs) for registration as A-labels.
185 A registry may offer any subset of valid IDNs, and may apply any
186 restrictions or bundling (grouping of similar labels together in one
187 registration) appropriate for the context of that registry.
188 Registration of labels is sometimes discussed separately from lookup,
189 and it is subject to a few specific requirements that do not apply to
190 lookup.
191
192 DNS clients and registries are subject to some differences in
193 requirements for handling IDNs. In particular, registries are urged
194 to register only exact, valid A-labels, while clients might do some
195 mapping to get from otherwise-invalid user input to a valid A-label.
196
197 The first version of IDNA was published in 2003 and is referred to
198 here as IDNA2003 to contrast it with the current version, which is
199 known as IDNA2008 (after the year in which IETF work started on it).
200 IDNA2003 consists of four documents: the IDNA base specification
201 [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep
202 [RFC3454]. The current set of documents, IDNA2008, is not dependent
203 on any of the IDNA2003 specifications other than the one for Punycode
204 encoding. References to "IDNA2008", "these specifications", or
205 "these documents" are to the entire IDNA2008 set listed in a separate
206 Definitions document [RFC5890]. The characters that are valid in
207 A-labels are identified from rules listed in the Tables document
208 [RFC5892], but validity can be derived from the Unicode properties of
209 those characters with a very few exceptions.
210
211 Traditionally, DNS labels are matched case-insensitively (as
212 described in the DNS specifications [RFC1034][RFC1035]). That
213 convention was preserved in IDNA2003 by a case-folding operation that
214
215
216
217 Klensin Informational [Page 4]
218 RFC 5894 IDNA Rationale August 2010
219
220
221 generally maps capital letters into lowercase ones. However, if case
222 rules are enforced from one language, another language sometimes
223 loses the ability to treat two characters separately. Case-
224 insensitivity is treated slightly differently in IDNA2008.
225
226 IDNA2003 used Unicode version 3.2 only. In order to keep up with new
227 characters added in new versions of Unicode, IDNA2008 decouples its
228 rules from any particular version of Unicode. Instead, the
229 attributes of new characters in Unicode, supplemented by a small
230 number of exception cases, determine how and whether the characters
231 can be used in IDNA labels.
232
233 This document provides informational context for IDNA2008, including
234 terminology, background, and policy discussions. It contains no
235 normative material; specifications for conformance to the IDNA2008
236 protocols appears entirely in the other documents in the series.
237
238 1.2. Terminology
239
240 Terminology for IDNA2008 appears in the Definitions document
241 [RFC5890]. That document also contains a road map to the IDNA2008
242 document collection. No attempt should be made to understand this
243 document without the definitions and concepts that appear there.
244
245 1.2.1. DNS "Name" Terminology
246
247 In the context of IDNs, the DNS term "name" has introduced some
248 confusion as people speak of DNS labels in terms of the words or
249 phrases of various natural languages. Historically, many of the
250 "names" in the DNS have been mnemonics to identify some particular
251 concept, object, or organization. They are typically rooted in some
252 language because most people think in language-based ways. But,
253 because they are mnemonics, they need not obey the orthographic
254 conventions of any language: it is not a requirement that it be
255 possible for them to be "words".
256
257 This distinction is important because the reasonable goal of an IDN
258 effort is not to be able to write the great Klingon (or language of
259 one's choice) novel in DNS labels but to be able to form a usefully
260 broad range of mnemonics in ways that are as natural as possible in a
261 very broad range of scripts.
262
263
264
265
266
267
268
269
270
271
272 Klensin Informational [Page 5]
273 RFC 5894 IDNA Rationale August 2010
274
275
276 1.2.2. New Terminology and Restrictions
277
278 IDNA2008 introduces new terminology. Precise definitions are
279 provided in the Definitions document for the terms U-label, A-Label,
280 LDH label (to which all valid pre-IDNA hostnames conformed), Reserved
281 LDH label (R-LDH label), XN-label, Fake A-label, and Non-Reserved LDH
282 label (NR-LDH label).
283
284 In addition, the term "putative label" has been adopted to refer to a
285 label that may appear to meet certain definitional constraints but
286 has not yet been sufficiently tested for validity.
287
288 These definitions are also illustrated in Figure 1 of the Definitions
289 document. R-LDH labels contain "--" in the third and fourth
290 character positions from the beginning of the label. In IDNA-aware
291 applications, only a subset of these reserved labels is permitted to
292 be used, namely the A-label subset. A-labels are a subset of the
293 R-LDH labels that begin with the case-insensitive string "xn--".
294 Labels that bear this prefix but that are not otherwise valid fall
295 into the "Fake A-label" category. The Non-Reserved labels (NR-LDH
296 labels) are implicitly valid since they do not bear any resemblance
297 to the labels specified by IDNA.
298
299 The creation of the Reserved-LDH category is required for three
300 reasons:
301
302 o to prevent confusion with pre-IDNA coding forms;
303
304 o to permit future extensions that would require changing the
305 prefix, no matter how unlikely those might be (see Section 7.4);
306 and
307
308 o to reduce the opportunities for attacks via the Punycode encoding
309 algorithm itself.
310
311 As with other documents in the IDNA2008 set, this document uses the
312 term "registry" to describe any zone in the DNS. That term, and the
313 terms "zone" or "zone administration", are interchangeable.
314
315 1.3. Objectives
316
317 These are the main objectives in revising IDNA.
318
319 o Use a more recent version of Unicode and allow IDNA to be
320 independent of Unicode versions, so that IDNA2008 need not be
321 updated for implementations to adopt code points from new Unicode
322 versions.
323
324
325
326
327 Klensin Informational [Page 6]
328 RFC 5894 IDNA Rationale August 2010
329
330
331 o Fix a very small number of code point categorizations that have
332 turned out to cause problems in the communities that use those
333 code points.
334
335 o Reduce the dependency on mapping, in favor of valid A-labels.
336 This will result in pre-mapped forms that are not valid IDNA
337 labels appearing less often in various contexts.
338
339 o Fix some details in the bidirectional code point handling
340 algorithms.
341
342 1.4. Applicability and Function of IDNA
343
344 The IDNA specification solves the problem of extending the repertoire
345 of characters that can be used in domain names to include a large
346 subset of the Unicode repertoire.
347
348 IDNA does not extend DNS. Instead, the applications (and, by
349 implication, the users) continue to see an exact-match lookup
350 service. Either there is a single name that matches exactly (subject
351 to the base DNS requirement of case-insensitive ASCII matching) or
352 there is no match. This model has served the existing applications
353 well, but it requires, with or without internationalized domain
354 names, that users know the exact spelling of the domain names that
355 are to be typed into applications such as web browsers and mail user
356 agents. The introduction of the larger repertoire of characters
357 potentially makes the set of misspellings larger, especially given
358 that in some cases the same appearance, for example on a business
359 card, might visually match several Unicode code points or several
360 sequences of code points.
361
362 The IDNA standard does not require any applications to conform to it,
363 nor does it retroactively change those applications. An application
364 can elect to use IDNA in order to support IDNs while maintaining
365 interoperability with existing infrastructure. For applications that
366 want to use non-ASCII characters in public DNS domain names, IDNA is
367 the only option that is defined at the time this specification is
368 published. Adding IDNA support to an existing application entails
369 changes to the application only, and leaves room for flexibility in
370 front-end processing and more specifically in the user interface (see
371 Section 6).
372
373 A great deal of the discussion of IDN solutions has focused on
374 transition issues and how IDNs will work in a world where not all of
375 the components have been updated. Proposals that were not chosen by
376 the original IDN Working Group would have depended on updating user
377 applications, DNS resolvers, and DNS servers in order for a user to
378 apply an internationalized domain name in any form or coding
379
380
381
382 Klensin Informational [Page 7]
383 RFC 5894 IDNA Rationale August 2010
384
385
386 acceptable under that method. While processing must be performed
387 prior to or after access to the DNS, IDNA requires no changes to the
388 DNS protocol, any DNS servers, or the resolvers on users' computers.
389
390 IDNA allows the graceful introduction of IDNs not only by avoiding
391 upgrades to existing infrastructure (such as DNS servers and mail
392 transport agents), but also by allowing some limited use of IDNs in
393 applications by using the ASCII-encoded representation of the labels
394 containing non-ASCII characters. While such names are user-
395 unfriendly to read and type, and hence not optimal for user input,
396 they can be used as a last resort to allow rudimentary IDN usage.
397 For example, they might be the best choice for display if it were
398 known that relevant fonts were not available on the user's computer.
399 In order to allow user-friendly input and output of the IDNs and
400 acceptance of some characters as equivalent to those to be processed
401 according to the protocol, the applications need to be modified to
402 conform to this specification.
403
404 This version of IDNA uses the Unicode character repertoire for
405 continuity with the original version of IDNA.
406
407 1.5. Comprehensibility of IDNA Mechanisms and Processing
408
409 One goal of IDNA2008, which is aided by the main goal of reducing the
410 dependency on mapping, is to improve the general understanding of how
411 IDNA works and what characters are permitted and what happens to
412 them. Comprehensibility and predictability to users and registrants
413 are important design goals for this effort. End-user applications
414 have an important role to play in increasing this comprehensibility.
415
416 Any system that tries to handle international characters encounters
417 some common problems. For example, a User Interface (UI) cannot
418 display a character if no font containing that character is
419 available. In some cases, internationalization enables effective
420 localization while maintaining some global uniformity but losing some
421 universality.
422
423 It is difficult to even make suggestions as to how end-user
424 applications should cope when characters and fonts are not available.
425 Because display functions are rarely controlled by the types of
426 applications that would call upon IDNA, such suggestions will rarely
427 be very effective.
428
429 Conversion between local character sets and normalized Unicode, if
430 needed, is part of this set of user interface issues. Those
431 conversions introduce complexity in a system that does not use
432 Unicode as its primary (or only) internal character coding system.
433 If a label is converted to a local character set that does not have
434
435
436
437 Klensin Informational [Page 8]
438 RFC 5894 IDNA Rationale August 2010
439
440
441 all the needed characters, or that uses different character-coding
442 principles, the user interface program may have to add special logic
443 to avoid or reduce loss of information.
444
445 The major difficulty may lie in accurately identifying the incoming
446 character set and applying the correct conversion routine. Even more
447 difficult, the local character coding system could be based on
448 conceptually different assumptions than those used by Unicode (e.g.,
449 choice of font encodings used for publications in some Indic
450 scripts). Those differences may not easily yield unambiguous
451 conversions or interpretations even if each coding system is
452 internally consistent and adequate to represent the local language
453 and script.
454
455 IDNA2008 shifts responsibility for character mapping and other
456 adjustments from the protocol (where it was located in IDNA2003) to
457 pre-processing before invoking IDNA itself. The intent is that this
458 change will lead to greater usage of fully-valid A-Labels or U-labels
459 in display, transit, and storage, which should aid comprehensibility
460 and predictability. A careful look at pre-processing raises issues
461 about what that pre-processing should do and at what point
462 pre-processing becomes harmful; how universally consistent
463 pre-processing algorithms can be; and how to be compatible with
464 labels prepared in an IDNA2003 context. Those issues are discussed
465 in Section 6 and in the Mapping document [IDNA2008-Mapping].
466
467 2. Processing in IDNA2008
468
469 IDNA2008 separates Domain Name Registration and Lookup in the
470 protocol specification (RFC 5891, Sections 4 and 5 [RFC5891]).
471 Although most steps in the two processes are similar, the separation
472 reflects current practice in which per-registry (DNS zone)
473 restrictions and special processing are applied at registration time
474 but not during lookup. Another significant benefit is that
475 separation facilitates incremental addition of permitted character
476 groups to avoid freezing on one particular version of Unicode.
477
478 The actual registration and lookup protocols for IDNA2008 are
479 specified in the Protocol document.
480
481 3. Permitted Characters: An Inclusion List
482
483 IDNA2008 adopts the inclusion model. A code point is assumed to be
484 invalid for IDN use unless it is included as part of a Unicode
485 property-based rule or, in rare cases, included individually by an
486 exception. When an implementation moves to a new version of Unicode,
487 the rules may indicate new valid code points.
488
489
490
491
492 Klensin Informational [Page 9]
493 RFC 5894 IDNA Rationale August 2010
494
495
496 This section provides an overview of the model used to establish the
497 algorithm and character lists of the Tables document [RFC5892] and
498 describes the names and applicability of the categories used there.
499 Note that the inclusion of a character in the PROTOCOL-VALID category
500 group (Section 3.1.1) does not imply that it can be used
501 indiscriminately; some characters are associated with contextual
502 rules that must be applied as well.
503
504 The information given in this section is provided to make the rules,
505 tables, and protocol easier to understand. The normative generating
506 rules that correspond to this informal discussion appear in the
507 Tables document, and the rules that actually determine what labels
508 can be registered or looked up are in the Protocol document.
509
510 3.1. A Tiered Model of Permitted Characters and Labels
511
512 Moving to an inclusion model involves a new specification for the
513 list of characters that are permitted in IDNs. In IDNA2003,
514 character validity is independent of context and fixed forever (or
515 until the standard is replaced). However, globally context-
516 independent rules have proved to be impractical because some
517 characters, especially those that are called "Join_Controls" in
518 Unicode, are needed to make reasonable use of some scripts but have
519 no visible effect in others. IDNA2003 prohibited those types of
520 characters entirely by discarding them. We now have a consensus that
521 under some conditions, these "joiner" characters are legitimately
522 needed to allow useful mnemonics for some languages and scripts. In
523 general, context-dependent rules help deal with characters (generally
524 characters that would otherwise be prohibited entirely) that are used
525 differently or perceived differently across different scripts, and
526 allow the standard to be applied more appropriately in cases where a
527 string is not universally handled the same way.
528
529 IDNA2008 divides all possible Unicode code points into four
530 categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED, and
531 UNASSIGNED.
532
533 3.1.1. PROTOCOL-VALID
534
535 Characters identified as PROTOCOL-VALID (often abbreviated PVALID)
536 are permitted in IDNs. Their use may be restricted by rules about
537 the context in which they appear or by other rules that apply to the
538 entire label in which they are to be embedded. For example, any
539 label that contains a character in this category that has a
540 "right-to-left" property must be used in context with the Bidi rules
541 [RFC5893]. The term PROTOCOL-VALID is used to stress the fact that
542 the presence of a character in this category does not imply that a
543 given registry need accept registrations containing any of the
544
545
546
547 Klensin Informational [Page 10]
548 RFC 5894 IDNA Rationale August 2010
549
550
551 characters in the category. Registries are still expected to apply
552 judgment about labels they will accept and to maintain rules
553 consistent with those judgments (see the Protocol document [RFC5891]
554 and Section 3.3).
555
556 Characters that are placed in the PROTOCOL-VALID category are
557 expected to never be removed from it or reclassified. While
558 theoretically characters could be removed from Unicode, such removal
559 would be inconsistent with the Unicode stability principles (see
560 UTR 39: Unicode Security Mechanisms [Unicode52], Appendix F) and
561 hence should never occur.
562
563 3.1.2. CONTEXTUAL RULE REQUIRED
564
565 Some characters may be unsuitable for general use in IDNs but
566 necessary for the plausible support of some scripts. The two most
567 commonly cited examples are the ZERO WIDTH JOINER and ZERO WIDTH
568 NON-JOINER characters (ZWJ, U+200D and ZWNJ, U+200C), but other
569 characters may require special treatment because they would otherwise
570 be DISALLOWED (typically because Unicode considers them punctuation
571 or special symbols) but need to be permitted in limited contexts.
572 Other characters are given this special treatment because they pose
573 exceptional danger of being used to produce misleading labels or to
574 cause unacceptable ambiguity in label matching and interpretation.
575
576 3.1.2.1. Contextual Restrictions
577
578 Characters with contextual restrictions are identified as CONTEXTUAL
579 RULE REQUIRED and are associated with a rule. The rule defines
580 whether the character is valid in a particular string, and also
581 whether the rule itself is to be applied on lookup as well as
582 registration.
583
584 A distinction is made between characters that indicate or prohibit
585 joining and ones similar to them (known as CONTEXT-JOINER or
586 CONTEXTJ) and other characters requiring contextual treatment
587 (CONTEXT-OTHER or CONTEXTO). Only the former require full testing at
588 lookup time.
589
590 It is important to note that these contextual rules cannot prevent
591 all uses of the relevant characters that might be confusing or
592 problematic. What they are expected to do is to confine
593 applicability of the characters to scripts (and narrower contexts)
594 where zone administrators are knowledgeable enough about the use of
595 those characters to be prepared to deal with them appropriately.
596
597
598
599
600
601
602 Klensin Informational [Page 11]
603 RFC 5894 IDNA Rationale August 2010
604
605
606 For example, a registry dealing with an Indic script that requires
607 ZWJ and/or ZWNJ as part of the writing system is expected to
608 understand where the characters have visible effect and where they do
609 not and to make registration rules accordingly. By contrast, a
610 registry dealing primarily with Latin or Cyrillic script might not be
611 actively aware that the characters exist, much less about the
612 consequences of embedding them in labels drawn from those scripts and
613 therefore should avoid accepting registrations containing those
614 characters, at least in labels using characters from the Latin or
615 Cyrillic scripts.
616
617 3.1.2.2. Rules and Their Application
618
619 Rules have descriptions such as "Must follow a character from Script
620 XYZ", "Must occur only if the entire label is in Script ABC", or
621 "Must occur only if the previous and subsequent characters have the
622 DFG property". The actual rules may be DEFINED or NULL. If present,
623 they may have values of "True" (character may be used in any position
624 in any label), "False" (character may not be used in any label), or
625 may be a set of procedural rules that specify the context in which
626 the character is permitted.
627
628 Because it is easier to identify these characters than to know that
629 they are actually needed in IDNs or how to establish exactly the
630 right rules for each one, a rule may have a null value in a given
631 version of the tables. Characters associated with null rules are not
632 permitted to appear in putative labels for either registration or
633 lookup. Of course, a later version of the tables might contain a
634 non-null rule.
635
636 The actual rules and their descriptions are in Sections 2 and 3 of
637 the Tables document [RFC5892]. That document also specifies the
638 creation of a registry for future rules.
639
640 3.1.3. DISALLOWED
641
642 Some characters are inappropriate for use in IDNs and are thus
643 excluded for both registration and lookup (i.e., IDNA-conforming
644 applications performing name lookup should verify that these
645 characters are absent; if they are present, the label strings should
646 be rejected rather than converted to A-labels and looked up. Some of
647 these characters are problematic for use in IDNs (such as the
648 FRACTION SLASH character, U+2044), while some of them (such as the
649 various HEART symbols, e.g., U+2665, U+2661, and U+2765, see
650 Section 7.6) simply fall outside the conventions for typical
651 identifiers (basically letters and numbers).
652
653
654
655
656
657 Klensin Informational [Page 12]
658 RFC 5894 IDNA Rationale August 2010
659
660
661 Of course, this category would include code points that had been
662 removed entirely from Unicode should such removals ever occur.
663
664 Characters that are placed in the DISALLOWED category are expected to
665 never be removed from it or reclassified. If a character is
666 classified as DISALLOWED in error and the error is sufficiently
667 problematic, the only recourse would be either to introduce a new
668 code point into Unicode and classify it as PROTOCOL-VALID or for the
669 IETF to accept the considerable costs of an incompatible change and
670 replace the relevant RFC with one containing appropriate exceptions.
671
672 There is provision for exception cases but, in general, characters
673 are placed into DISALLOWED if they fall into one or more of the
674 following groups:
675
676 o The character is a compatibility equivalent for another character.
677 In slightly more precise Unicode terms, application of
678 Normalization Form KC (NFKC) to the character yields some other
679 character.
680
681 o The character is an uppercase form or some other form that is
682 mapped to another character by Unicode case folding.
683
684 o The character is a symbol or punctuation form or, more generally,
685 something that is not a letter, digit, or a mark that is used to
686 form a letter or digit.
687
688 3.1.4. UNASSIGNED
689
690 For convenience in processing and table-building, code points that do
691 not have assigned values in a given version of Unicode are treated as
692 belonging to a special UNASSIGNED category. Such code points are
693 prohibited in labels to be registered or looked up. The category
694 differs from DISALLOWED in that code points are moved out of it by
695 the simple expedient of being assigned in a later version of Unicode
696 (at which point, they are classified into one of the other categories
697 as appropriate).
698
699 The rationale for restricting the processing of UNASSIGNED characters
700 is simply that the properties of such code points cannot be
701 completely known until actual characters are assigned to them. For
702 example, assume that an UNASSIGNED code point were included in a
703 label to be looked up. Assume that the code point was later assigned
704 to a character that required some set of contextual rules. With that
705 combination, un-updated instances of IDNA-aware software might permit
706 lookup of labels containing the previously unassigned characters
707 while updated versions of the software might restrict use of the same
708
709
710
711
712 Klensin Informational [Page 13]
713 RFC 5894 IDNA Rationale August 2010
714
715
716 label in lookup, depending on the contextual rules. It should be
717 clear that under no circumstance should an UNASSIGNED character be
718 permitted in a label to be registered as part of a domain name.
719
720 3.2. Registration Policy
721
722 While these recommendations cannot and should not define registry
723 policies, registries should develop and apply additional restrictions
724 as needed to reduce confusion and other problems. For example, it is
725 generally believed that labels containing characters from more than
726 one script are a bad practice although there may be some important
727 exceptions to that principle. Some registries may choose to restrict
728 registrations to characters drawn from a very small number of
729 scripts. For many scripts, the use of variant techniques such as
730 those as described in the JET specification for the CJK script
731 [RFC3743] and its generalization [RFC4290], and illustrated for
732 Chinese by the tables provided by the Chinese Domain Name Consortium
733 [RFC4713] may be helpful in reducing problems that might be perceived
734 by users.
735
736 In general, users will benefit if registries only permit characters
737 from scripts that are well-understood by the registry or its
738 advisers. If a registry decides to reduce opportunities for
739 confusion by constructing policies that disallow characters used in
740 historic writing systems or characters whose use is restricted to
741 specialized, highly technical contexts, some relevant information may
742 be found in Section 2.4 (Specific Character Adjustments) of Unicode
743 Identifier and Pattern Syntax [Unicode-UAX31], especially Table 4
744 (Candidate Characters for Exclusion from Identifiers), and Section
745 3.1 (General Security Profile for Identifiers) in Unicode Security
746 Mechanisms [Unicode-UTS39].
747
748 The requirement (in Section 4.1 of the Protocol document [RFC5891])
749 that registration procedures use only U-labels and/or A-labels is
750 intended to ensure that registrants are fully aware of exactly what
751 is being registered as well as encouraging use of those canonical
752 forms. That provision should not be interpreted as requiring that
753 registrants need to provide characters in a particular code sequence.
754 Registrant input conventions and management are part of registrant-
755 registrar interactions and relationships between registries and
756 registrars and are outside the scope of these standards.
757
758 It is worth stressing that these principles of policy development and
759 application apply at all levels of the DNS, not only, e.g., top level
760 domain (TLD) or second level domain (SLD) registrations. Even a
761 trivial, "anything is permitted that is valid under the protocol"
762 policy is helpful in that it helps users and application developers
763 know what to expect.
764
765
766
767 Klensin Informational [Page 14]
768 RFC 5894 IDNA Rationale August 2010
769
770
771 3.3. Layered Restrictions: Tables, Context, Registration, and
772 Applications
773
774 The character rules in IDNA2008 are based on the realization that
775 there is no single magic bullet for any of the security,
776 confusability, or other issues associated with IDNs. Instead, the
777 specifications define a variety of approaches. The character tables
778 are the first mechanism, protocol rules about how those characters
779 are applied or restricted in context are the second, and those two in
780 combination constitute the limits of what can be done in the
781 protocol. As discussed in the previous section (Section 3.2),
782 registries are expected to restrict what they permit to be
783 registered, devising and using rules that are designed to optimize
784 the balance between confusion and risk on the one hand and maximum
785 expressiveness in mnemonics on the other.
786
787 In addition, there is an important role for user interface programs
788 in warning against label forms that appear problematic given their
789 knowledge of local contexts and conventions. Of course, no approach
790 based on naming or identifiers alone can protect against all threats.
791
792 4. Application-Related Issues
793
794 4.1. Display and Network Order
795
796 Domain names are always transmitted in network order (the order in
797 which the code points are sent in protocols), but they may have a
798 different display order (the order in which the code points are
799 displayed on a screen or paper). When a domain name contains
800 characters that are normally written right to left, display order may
801 be affected although network order is not. It gets even more
802 complicated if left-to-right and right-to-left labels are adjacent to
803 each other within a domain name. The decision about the display
804 order is ultimately under the control of user agents -- including Web
805 browsers, mail clients, hosted Web applications and many more --
806 which may be highly localized. Should a domain name abc.def, in
807 which both labels are represented in scripts that are written right
808 to left, be displayed as fed.cba or cba.fed? Applications that are
809 in deployment today are already diverse, and one can find examples of
810 either choice.
811
812 The picture changes once again when an IDN appears in an
813 Internationalized Resource Identifier (IRI) [RFC3987]. An IRI or
814 internationalized email address contains elements other than the
815 domain name. For example, IRIs contain protocol identifiers and
816 field delimiter syntax such as "http://" or "mailto:" while email
817 addresses contain the "@" to separate local parts from domain names.
818
819
820
821
822 Klensin Informational [Page 15]
823 RFC 5894 IDNA Rationale August 2010
824
825
826 An IRI in network order begins with "http://" followed by domain
827 labels in network order, thus "http://abc.def".
828
829 User interface programs are not required to display and allow input
830 of IRIs directly but often do so. Implementers have to choose
831 whether the overall direction of these strings will always be left to
832 right (or right to left) for an IRI or email address. The natural
833 order for a user typing a domain name on a right-to-left system is
834 fed.cba. Should the right-to-left (RTL) user interface reverse the
835 entire domain name each time a domain name is typed? Does this
836 change if the user types "http://" right before typing a domain name,
837 thus implying that the user is beginning at the beginning of the
838 network-order IRI? Experience in the 1980s and 1990s with mixing
839 systems in which domain name labels were read in network order (left
840 to right) and those in which those labels were read right to left
841 would predict a great deal of confusion.
842
843 If each implementation of each application makes its own decisions on
844 these issues, users will develop heuristics that will sometimes fail
845 when switching applications. However, while some display order
846 conventions, voluntarily adopted, would be desirable to reduce
847 confusion, such suggestions are beyond the scope of these
848 specifications.
849
850 4.2. Entry and Display in Applications
851
852 Applications can accept and display domain names using any character
853 set or character coding system. The IDNA protocol does not
854 necessarily affect the interface between users and applications. An
855 IDNA-aware application can accept and display internationalized
856 domain names in two formats: as the internationalized character
857 set(s) supported by the application (i.e., an appropriate local
858 representation of a U-label) and as an A-label. Applications may
859 allow the display of A-labels, but are encouraged not to do so except
860 as an interface for special purposes, possibly for debugging, or to
861 cope with display limitations. In general, they should allow, but
862 not encourage, user input of A-labels. A-labels are opaque and ugly,
863 and malicious variations on them are not easily detected by users.
864 Where possible, they should thus only be exposed when they are
865 absolutely needed. Because IDN labels can be rendered either as
866 A-labels or U-labels, the application may reasonably have an option
867 for the user to select the preferred method of display. Rendering
868 the U-label should normally be the default.
869
870 Domain names are often stored and transported in many places. For
871 example, they are part of documents such as mail messages and web
872 pages. They are transported in many parts of many protocols, such as
873 both the control commands of SMTP and associated message body parts,
874
875
876
877 Klensin Informational [Page 16]
878 RFC 5894 IDNA Rationale August 2010
879
880
881 and in the headers and the body content in HTTP. It is important to
882 remember that domain names appear both in domain name slots and in
883 the content that is passed over protocols, and it would be helpful if
884 protocols explicitly define what their domain name slots are.
885
886 In protocols and document formats that define how to handle
887 specification or negotiation of charsets, labels can be encoded in
888 any charset allowed by the protocol or document format. If a
889 protocol or document format only allows one charset, the labels must
890 be given in that charset. Of course, not all charsets can properly
891 represent all labels. If a U-label cannot be displayed in its
892 entirety, the only choice (without loss of information) may be to
893 display the A-label.
894
895 Where a protocol or document format allows IDNs, labels should be in
896 whatever character encoding and escape mechanism the protocol or
897 document format uses in the local environment. This provision is
898 intended to prevent situations in which, e.g., UTF-8 domain names
899 appear embedded in text that is otherwise in some other character
900 coding.
901
902 All protocols that use domain name slots (see Section 2.3.2.6 in the
903 Definitions document [RFC5890]) already have the capacity for
904 handling domain names in the ASCII charset. Thus, A-labels can
905 inherently be handled by those protocols.
906
907 IDNA2008 does not specify required mappings between one character or
908 code point and others. An extended discussion of mapping issues
909 appears in Section 6 and specific recommendations appear in the
910 Mapping document [IDNA2008-Mapping]. In general, IDNA2008 prohibits
911 characters that would be mapped to others by normalization or other
912 rules. As examples, while mathematical characters based on Latin
913 ones are accepted as input to IDNA2003, they are prohibited in
914 IDNA2008. Similarly, uppercase characters, double-width characters,
915 and other variations are prohibited as IDNA input although mapping
916 them as needed in user interfaces is strongly encouraged.
917
918 Since the rules in the Tables document [RFC5892] have the effect that
919 only strings that are not transformed by NFKC are valid, if an
920 application chooses to perform NFKC normalization before lookup, that
921 operation is safe since this will never make the application unable
922 to look up any valid string. However, as discussed above, the
923 application cannot guarantee that any other application will perform
924 that mapping, so it should be used only with caution and for informed
925 users.
926
927
928
929
930
931
932 Klensin Informational [Page 17]
933 RFC 5894 IDNA Rationale August 2010
934
935
936 In many cases, these prohibitions should have no effect on what the
937 user can type as input to the lookup process. It is perfectly
938 reasonable for systems that support user interfaces to perform some
939 character mapping that is appropriate to the local environment. This
940 would normally be done prior to actual invocation of IDNA. At least
941 conceptually, the mapping would be part of the Unicode conversions
942 discussed above and in the Protocol document [RFC5891]. However,
943 those changes will be local ones only -- local to environments in
944 which users will clearly understand that the character forms are
945 equivalent. For use in interchanges among systems, it appears to be
946 much more important that U-labels and A-labels can be mapped back and
947 forth without loss of information.
948
949 One specific, and very important, instance of this strategy arises
950 with case folding. In the ASCII-only DNS, names are looked up and
951 matched in a case-independent way, but no actual case folding occurs.
952 Names can be placed in the DNS in either uppercase or lowercase form
953 (or any mixture of them) and that form is preserved, returned in
954 queries, and so on. IDNA2003 approximated that behavior for
955 non-ASCII strings by performing case folding at registration time
956 (resulting in only lowercase IDNs in the DNS) and when names were
957 looked up.
958
959 As suggested earlier in this section, it appears to be desirable to
960 do as little character mapping as possible as long as Unicode works
961 correctly (e.g., Normalization Form C (NFC) mapping to resolve
962 different codings for the same character is still necessary although
963 the specifications require that it be performed prior to invoking the
964 protocol) in order to make the mapping between A-labels and U-labels
965 idempotent. Case mapping is not an exception to this principle. If
966 only lowercase characters can be registered in the DNS (i.e., be
967 present in a U-label), then IDNA2008 should prohibit uppercase
968 characters as input even though user interfaces to applications
969 should probably map those characters. Some other considerations
970 reinforce this conclusion. For example, in ASCII case mapping for
971 individual characters, uppercase(character) is always equal to
972 uppercase(lowercase(character)). That may not be true with IDNs. In
973 some scripts that use case distinctions, there are a few characters
974 that do not have counterparts in one case or the other. The
975 relationship between uppercase and lowercase may even be language-
976 dependent, with different languages (or even the same language in
977 different areas) expecting different mappings. User interface
978 programs can meet the expectations of users who are accustomed to the
979 case-insensitive DNS environment by performing case folding prior to
980 IDNA processing, but the IDNA procedures themselves should neither
981 require such mapping nor expect them when they are not natural to the
982 localized environment.
983
984
985
986
987 Klensin Informational [Page 18]
988 RFC 5894 IDNA Rationale August 2010
989
990
991 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate
992 Character Forms
993
994 Users have expectations about character matching or equivalence that
995 are based on their own languages and the orthography of those
996 languages. These expectations may not always be met in a global
997 system, especially if multiple languages are written using the same
998 script but using different conventions. Some examples:
999
1000 o A Norwegian user might expect a label with the ae-ligature to be
1001 treated as the same label as one using the Swedish spelling with
1002 a-diaeresis even though applying that mapping to English would be
1003 astonishing to users.
1004
1005 o A German user might expect a label with an o-umlaut and a label
1006 that had "oe" substituted, but was otherwise the same, to be
1007 treated as equivalent even though that substitution would be a
1008 clear error in Swedish.
1009
1010 o A Chinese user might expect automatic matching of Simplified and
1011 Traditional Chinese characters, but applying that matching for
1012 Korean or Japanese text would create considerable confusion.
1013
1014 o An English user might expect "theater" and "theatre" to match.
1015
1016 A number of languages use alphabetic scripts in which single phonemes
1017 are written using two characters, termed a "digraph", for example,
1018 the "ph" in "pharmacy" and "telephone". (Such characters can also
1019 appear consecutively without forming a digraph, as in "tophat".)
1020 Certain digraphs may be indicated typographically by setting the two
1021 characters closer together than they would be if used consecutively
1022 to represent different phonemes. Some digraphs are fully joined as
1023 ligatures. For example, the word "encyclopaedia" is sometimes set
1024 with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph
1025 forms have the same interpretation across all languages that use a
1026 given script, application of Unicode normalization generally resolves
1027 the differences and causes them to match. When they have different
1028 interpretations, matching must utilize other methods, presumably
1029 chosen at the registry level, or users must be educated to understand
1030 that matching will not occur.
1031
1032 The nature of the problem can be illustrated by many words in the
1033 Norwegian language, where the "ae" ligature is the 27th letter of a
1034 29-letter extended Latin alphabet. It is equivalent to the 28th
1035 letter of the Swedish alphabet (also containing 29 letters),
1036 U+00E4 LATIN SMALL LETTER A WITH DIAERESIS, for which an "ae" cannot
1037 be substituted according to current orthographic standards. That
1038 character (U+00E4) is also part of the German alphabet where, unlike
1039
1040
1041
1042 Klensin Informational [Page 19]
1043 RFC 5894 IDNA Rationale August 2010
1044
1045
1046 in the Nordic languages, the two-character sequence "ae" is usually
1047 treated as a fully acceptable alternate orthography for the "umlauted
1048 a" character. The inverse is however not true, and those two
1049 characters cannot necessarily be combined into an "umlauted a". This
1050 also applies to another German character, the "umlauted o"
1051 (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example,
1052 cannot be used for writing the name of the author "Goethe". It is
1053 also a letter in the Swedish alphabet where, like the "a with
1054 diaeresis", it cannot be correctly represented as "oe" and in the
1055 Norwegian alphabet, where it is represented, not as "o with
1056 diaeresis", but as "slashed o", U+00F8.
1057
1058 Some of the ligatures that have explicit code points in Unicode were
1059 given special handling in IDNA2003 and now pose additional problems
1060 in transition. See Section 7.2.
1061
1062 Additional cases with alphabets written right to left are described
1063 in Section 4.5.
1064
1065 Matching and comparison algorithm selection often requires
1066 information about the language being used, context, or both --
1067 information that is not available to IDNA or the DNS. Consequently,
1068 IDNA2008 makes no attempt to treat combined characters in any special
1069 way. A registry that is aware of the language context in which
1070 labels are to be registered, and where that language sometimes (or
1071 always) treats the two-character sequences as equivalent to the
1072 combined form, should give serious consideration to applying a
1073 "variant" model [RFC3743][RFC4290] or to prohibiting registration of
1074 one of the forms entirely, to reduce the opportunities for user
1075 confusion and fraud that would result from the related strings being
1076 registered to different parties.
1077
1078 4.4. Case Mapping and Related Issues
1079
1080 In the DNS, ASCII letters are stored with their case preserved.
1081 Matching during the query process is case-independent, but none of
1082 the information that might be represented by choices of case has been
1083 lost. That model has been accidentally helpful because, as people
1084 have created DNS labels by catenating words (or parts of words) to
1085 form labels, case has often been used to distinguish among components
1086 and make the labels more memorable.
1087
1088 Since DNS servers do not get involved in parsing IDNs, they cannot do
1089 case-independent matching. Thus, keeping the cases separate in
1090 lookup or registration, and doing matching at the server, is not
1091 feasible with IDNA or any similar approach. Matching of characters
1092 that are considered to differ only by case must be done, if desired,
1093 by programs invoking IDNA lookup even though it wasn't done by ASCII-
1094
1095
1096
1097 Klensin Informational [Page 20]
1098 RFC 5894 IDNA Rationale August 2010
1099
1100
1101 only DNS clients. That situation was recognized in IDNA2003 and
1102 nothing in IDNA2008 fundamentally changes it or could do so. In
1103 IDNA2003, all characters are case folded and mapped by clients in a
1104 standardized step.
1105
1106 Even in scripts that generally support case distinctions, some
1107 characters do not have uppercase forms. For example, the Unicode
1108 case-folding operation maps Greek Final Form Sigma (U+03C2) to the
1109 medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) to
1110 "ss". Neither of these mappings is reversible because the uppercase
1111 of U+03C3 is the uppercase Sigma (U+03A3) and "ss" is an ASCII
1112 string. IDNA2008 permits, at the risk of some incompatibility,
1113 slightly more flexibility in this area by avoiding case folding and
1114 treating these characters as themselves. Approaches to handling one-
1115 way mappings are discussed in Section 7.2.
1116
1117 Because IDNA2003 maps Final Sigma and Eszett to other characters, and
1118 the reverse mapping is never possible, neither Final Sigma nor Eszett
1119 can be represented in the ACE form of IDNA2003 IDN nor in the native
1120 character (U-label) form derived from it. With IDNA2008, both
1121 characters can be used in an IDN and so the A-label used for lookup
1122 for any U-label containing those characters is now different. See
1123 Section 7.1 for a discussion of what kinds of changes might require
1124 the IDNA prefix to change; after extended discussions, the IDNABIS
1125 Working Group came to consensus that the change for these characters
1126 did not justify a prefix change.
1127
1128 4.5. Right-to-Left Text
1129
1130 In order to be sure that the directionality of right-to-left text is
1131 unambiguous, IDNA2003 required that any label in which right-to-left
1132 characters appear both starts and ends with them and that it does not
1133 include any characters with strong left-to-right properties (that
1134 excludes other alphabetic characters but permits European digits).
1135 Any other string that contains a right-to-left character and does not
1136 meet those requirements is rejected. This is one of the few places
1137 where the IDNA algorithms (both in IDNA2003 and in IDNA2008) examine
1138 an entire label, not just individual characters. The algorithmic
1139 model used in IDNA2003 rejects the label when the final character in
1140 a right-to-left string requires a combining mark in order to be
1141 correctly represented.
1142
1143 That prohibition is not acceptable for writing systems for languages
1144 written with consonantal alphabets to which diacritical vocalic
1145 systems are applied, and for languages with orthographies derived
1146 from them where the combining marks may have different functionality.
1147 In both cases, the combining marks can be essential components of the
1148 orthography. Examples of this are Yiddish, written with an extended
1149
1150
1151
1152 Klensin Informational [Page 21]
1153 RFC 5894 IDNA Rationale August 2010
1154
1155
1156 Hebrew script, and Dhivehi (the official language of Maldives), which
1157 is written in the Thaana script (which is, in turn, derived from the
1158 Arabic script). IDNA2008 removes the restriction on final combining
1159 characters with a new set of rules for right-to-left scripts and
1160 their characters. Those new rules are specified in the Bidi document
1161 [RFC5893].
1162
1163 5. IDNs and the Robustness Principle
1164
1165 The "Robustness Principle" is often stated as "Be conservative about
1166 what you send and liberal in what you accept" (see, e.g., Section
1167 1.2.2 of the applications-layer Host Requirements specification
1168 [RFC1123]). This principle applies to IDNA. In applying the
1169 principle to registries as the source ("sender") of all registered
1170 and useful IDNs, registries are responsible for being conservative
1171 about what they register and put out in the Internet. For IDNs to
1172 work well, zone administrators (registries) must have and require
1173 sensible policies about what is registered -- conservative policies
1174 -- and implement and enforce them.
1175
1176 Conversely, lookup applications are expected to reject labels that
1177 clearly violate global (protocol) rules (no one has ever seriously
1178 claimed that being liberal in what is accepted requires being
1179 stupid). However, once one gets past such global rules and deals
1180 with anything sensitive to script or locale, it is necessary to
1181 assume that garbage has not been placed into the DNS, i.e., one must
1182 be liberal about what one is willing to look up in the DNS rather
1183 than guessing about whether it should have been permitted to be
1184 registered.
1185
1186 If a string cannot be successfully found in the DNS after the lookup
1187 processing described here, it makes no difference whether it simply
1188 wasn't registered or was prohibited by some rule at the registry.
1189 Application implementers should be aware that where DNS wildcards are
1190 used, the ability to successfully resolve a name does not guarantee
1191 that it was actually registered.
1192
1193 6. Front-end and User Interface Processing for Lookup
1194
1195 Domain names may be identified and processed in many contexts. They
1196 may be typed in by users themselves or embedded in an identifier such
1197 as an email address, URI, or IRI. They may occur in running text or
1198 be processed by one system after being provided in another. Systems
1199 may try to normalize URLs to determine (or guess) whether a reference
1200 is valid or if two references point to the same object without
1201 actually looking the objects up (comparison without lookup is
1202 necessary for URI types that are not intended to be resolved). Some
1203 of these goals may be more easily and reliably satisfied than others.
1204
1205
1206
1207 Klensin Informational [Page 22]
1208 RFC 5894 IDNA Rationale August 2010
1209
1210
1211 While there are strong arguments for any domain name that is placed
1212 "on the wire" -- transmitted between systems -- to be in the zero-
1213 ambiguity forms of A-labels, it is inevitable that programs that
1214 process domain names will encounter U-labels or variant forms.
1215
1216 An application that implements the IDNA protocol [RFC5891] will
1217 always take any user input and convert it to a set of Unicode code
1218 points. That user input may be acquired by any of several different
1219 input methods, all with differing conversion processes to be taken
1220 into consideration (e.g., typed on a keyboard, written by hand onto
1221 some sort of digitizer, spoken into a microphone and interpreted by a
1222 speech-to-text engine, etc.). The process of taking any particular
1223 user input and mapping it into a Unicode code point may be a simple
1224 one: if a user strikes the "A" key on a US English keyboard, without
1225 any modifiers such as the "Shift" key held down, in order to draw a
1226 Latin small letter A ("a"), many (perhaps most) modern operating
1227 system input methods will produce to the calling application the code
1228 point U+0061, encoded in a single octet.
1229
1230 Sometimes the process is somewhat more complicated: a user might
1231 strike a particular set of keys to represent a combining macron
1232 followed by striking the "A" key in order to draw a Latin small
1233 letter A with a macron above it. Depending on the operating system,
1234 the input method chosen by the user, and even the parameters with
1235 which the application communicates with the input method, the result
1236 might be the code point U+0101 (encoded as two octets in UTF-8 or
1237 UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed
1238 by the code point U+0304 (again, encoded in three or more octets,
1239 depending upon the encoding used) or even the code point U+FF41
1240 followed by the code point U+0304 (and encoded in some form). These
1241 examples leave aside the issue of operating systems and input methods
1242 that do not use Unicode code points for their character set.
1243
1244 In every case, applications (with the help of the operating systems
1245 on which they run and the input methods used) need to perform a
1246 mapping from user input into Unicode code points.
1247
1248 IDNA2003 used a model whereby input was taken from the user, mapped
1249 (via whatever input method mechanisms were used) to a set of Unicode
1250 code points, and then further mapped to a set of Unicode code points
1251 using the Nameprep profile [RFC3491]. In this procedure, there are
1252 two separate mapping steps: first, a mapping done by the input method
1253 (which might be controlled by the operating system, the application,
1254 or some combination) and then a second mapping performed by the
1255 Nameprep portion of the IDNA protocol. The mapping done in Nameprep
1256 includes a particular mapping table to re-map some characters to
1257 other characters, a particular normalization, and a set of prohibited
1258 characters.
1259
1260
1261
1262 Klensin Informational [Page 23]
1263 RFC 5894 IDNA Rationale August 2010
1264
1265
1266 Note that the result of the two-step mapping process means that the
1267 mapping chosen by the operating system or application in the first
1268 step might differ significantly from the mapping supplied by the
1269 Nameprep profile in the second step. This has advantages and
1270 disadvantages. Of course, the second mapping regularizes what gets
1271 looked up in the DNS, making for better interoperability between
1272 implementations that use the Nameprep mapping. However, the
1273 application or operating system may choose mappings in their input
1274 methods, which when passed through the second (Nameprep) mapping
1275 result in characters that are "surprising" to the end user.
1276
1277 The other important feature of IDNA2003 is that, with very few
1278 exceptions, it assumes that any set of Unicode code points provided
1279 to the Nameprep mapping can be mapped into a string of Unicode code
1280 points that are "sensible", even if that means mapping some code
1281 points to nothing (that is, removing the code points from the
1282 string). This allowed maximum flexibility in input strings.
1283
1284 The present version of IDNA (IDNA2008) differs significantly in
1285 approach from the original version. First and foremost, it does not
1286 provide explicit mapping instructions. Instead, it assumes that the
1287 application (perhaps via an operating system input method) will do
1288 whatever mapping it requires to convert input into Unicode code
1289 points. This has the advantage of giving flexibility to the
1290 application to choose a mapping that is suitable for its user given
1291 specific user requirements, and avoids the two-step mapping of the
1292 original protocol. Instead of a mapping, IDNA2008 provides a set of
1293 categories that can be used to specify the valid code points allowed
1294 in a domain name.
1295
1296 In principle, an application ought to take user input of a domain
1297 name and convert it to the set of Unicode code points that represent
1298 the domain name the user intends. As a practical matter, of course,
1299 determining user intent is a tricky business, so an application needs
1300 to choose a reasonable mapping from user input. That may differ
1301 based on the particular circumstances of a user, depending on locale,
1302 language, type of input method, etc. It is up to the application to
1303 make a reasonable choice.
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317 Klensin Informational [Page 24]
1318 RFC 5894 IDNA Rationale August 2010
1319
1320
1321 7. Migration from IDNA2003 and Unicode Version Synchronization
1322
1323 7.1. Design Criteria
1324
1325 As mentioned above and in the IAB review and recommendations for IDNs
1326 [RFC4690], two key goals of the IDNA2008 design are:
1327
1328 o to enable applications to be agnostic about whether they are being
1329 run in environments supporting any Unicode version from 3.2
1330 onward.
1331
1332 o to permit incrementally adding new characters, character groups,
1333 scripts, and other character collections as they are incorporated
1334 into Unicode, doing so without disruption and, in the long term,
1335 without "heavy" processes (an IETF consensus process is required
1336 by the IDNA2008 specifications and is expected to be required and
1337 used until significant experience accumulates with IDNA operations
1338 and new versions of Unicode).
1339
1340 7.1.1. Summary and Discussion of IDNA Validity Criteria
1341
1342 The general criteria for a label to be considered valid under IDNA
1343 are (the actual rules are rigorously defined in the Protocol
1344 [RFC5891] and Tables [RFC5892] documents):
1345
1346 o The characters are "letters", marks needed to form letters,
1347 numerals, or other code points used to write words in some
1348 language. Symbols, drawing characters, and various notational
1349 characters are intended to be permanently excluded. There is no
1350 evidence that they are important enough to Internet operations or
1351 internationalization to justify expansion of domain names beyond
1352 the general principle of "letters, digits, and hyphen".
1353 (Additional discussion and rationale for the symbol decision
1354 appears in Section 7.6.)
1355
1356 o Other than in very exceptional cases, e.g., where they are needed
1357 to write substantially any word of a given language, punctuation
1358 characters are excluded. The fact that a word exists is not proof
1359 that it should be usable in a DNS label, and DNS labels are not
1360 expected to be usable for multiple-word phrases (although they are
1361 certainly not prohibited if the conventions and orthography of a
1362 particular language cause that to be possible).
1363
1364 o Characters that are unassigned (have no character assignment at
1365 all) in the version of Unicode being used by the registry or
1366 application are not permitted, even on lookup. The issues
1367 involved in this decision are discussed in Section 7.7.
1368
1369
1370
1371
1372 Klensin Informational [Page 25]
1373 RFC 5894 IDNA Rationale August 2010
1374
1375
1376 o Any character that is mapped to another character by a current
1377 version of NFKC is prohibited as input to IDNA (for either
1378 registration or lookup). With a few exceptions, this principle
1379 excludes any character mapped to another by Nameprep [RFC3491].
1380
1381 The principles above drive the design of rules that are specified
1382 exactly in the Tables document. Those rules identify the characters
1383 that are valid under IDNA. The rules themselves are normative, and
1384 the tables are derived from them, rather than vice versa.
1385
1386 7.1.2. Labels in Registration
1387
1388 Any label registered in a DNS zone must be validated -- i.e., the
1389 criteria for that label must be met -- in order for applications to
1390 work as intended. This principle is not new. For example, since the
1391 DNS was first deployed, zone administrators have been expected to
1392 verify that names meet "hostname" requirements [RFC0952] where those
1393 requirements are imposed by the expected applications. Other
1394 applications contexts, such as the later addition of special service
1395 location formats [RFC2782] imposed new requirements on zone
1396 administrators. For zones that will contain IDNs, support for
1397 Unicode version-independence requires restrictions on all strings
1398 placed in the zone. In particular, for such zones (the exact rules
1399 appear in Section 4 of the Protocol document [RFC5891]):
1400
1401 o Any label that appears to be an A-label, i.e., any label that
1402 starts in "xn--", must be valid under IDNA, i.e., they must be
1403 valid A-labels, as discussed in Section 2 above.
1404
1405 o The Unicode tables (i.e., tables of code points, character
1406 classes, and properties) and IDNA tables (i.e., tables of
1407 contextual rules such as those that appear in the Tables
1408 document), must be consistent on the systems performing or
1409 validating labels to be registered. Note that this does not
1410 require that tables reflect the latest version of Unicode, only
1411 that all tables used on a given system are consistent with each
1412 other.
1413
1414 Under this model, registry tables will need to be updated (both the
1415 Unicode-associated tables and the tables of permitted IDN characters)
1416 to enable a new script or other set of new characters. The registry
1417 will not be affected by newer versions of Unicode, or newly
1418 authorized characters, until and unless it wishes to support them.
1419 The zone administrator is responsible for verifying validity for IDNA
1420 as well as its local policies -- a more extensive set of checks than
1421 are required for looking up the labels. Systems looking up or
1422
1423
1424
1425
1426
1427 Klensin Informational [Page 26]
1428 RFC 5894 IDNA Rationale August 2010
1429
1430
1431 resolving DNS labels, especially IDN DNS labels, must be able to
1432 assume that applicable registration rules were followed for names
1433 entered into the DNS.
1434
1435 7.1.3. Labels in Lookup
1436
1437 Any application processing a label through IDNA so it can be looked
1438 up in a DNS zone is required to (the exact rules appear in Section 5
1439 of the Protocol document [RFC5891]):
1440
1441 o Maintain IDNA and Unicode tables that are consistent with regard
1442 to versions, i.e., unless the application actually executes the
1443 classification rules in the Tables document [RFC5892], its IDNA
1444 tables must be derived from the version of Unicode that is
1445 supported more generally on the system. As with registration, the
1446 tables need not reflect the latest version of Unicode, but they
1447 must be consistent.
1448
1449 o Validate the characters in labels to be looked up only to the
1450 extent of determining that the U-label does not contain
1451 "DISALLOWED" code points or code points that are unassigned in its
1452 version of Unicode.
1453
1454 o Validate the label itself for conformance with a small number of
1455 whole-label rules. In particular, it must verify that:
1456
1457 * there are no leading combining marks,
1458
1459 * the Bidi conditions are met if right-to-left characters appear,
1460
1461 * any required contextual rules are available, and
1462
1463 * any contextual rules that are associated with joiner characters
1464 (and CONTEXTJ characters more generally) are tested.
1465
1466 o Do not reject labels based on other contextual rules about
1467 characters, including mixed-script label prohibitions. Such rules
1468 may be used to influence presentation decisions in the user
1469 interface, but not to avoid looking up domain names.
1470
1471 To further clarify the rules about handling characters that require
1472 contextual rules, note that one can have a context-required character
1473 (i.e., one that requires a rule), but no rule. In that case, the
1474 character is treated the same way DISALLOWED characters are treated,
1475 until and unless a rule is supplied. That state is more or less
1476 equivalent to "the idea of permitting this character is accepted in
1477 principle, but it won't be permitted in practice until consensus is
1478 reached on a safe way to use it".
1479
1480
1481
1482 Klensin Informational [Page 27]
1483 RFC 5894 IDNA Rationale August 2010
1484
1485
1486 The ability to add a rule more or less exempts these characters from
1487 the prohibition against reclassifying characters from DISALLOWED to
1488 PVALID.
1489
1490 And, obviously, "no rule" is different from "have a rule, but the
1491 test either succeeds or fails".
1492
1493 Lookup applications that follow these rules, rather than having their
1494 own criteria for rejecting lookup attempts, are not sensitive to
1495 version incompatibilities with the particular zone registry
1496 associated with the domain name except for labels containing
1497 characters recently added to Unicode.
1498
1499 An application or client that processes names according to this
1500 protocol and then resolves them in the DNS will be able to locate any
1501 name that is registered, as long as those registrations are valid
1502 under IDNA and its version of the IDNA tables is sufficiently up to
1503 date to interpret all of the characters in the label. Messages to
1504 users should distinguish between "label contains an unallocated code
1505 point" and other types of lookup failures. A failure on the basis of
1506 an old version of Unicode may lead the user to a desire to upgrade to
1507 a newer version, but will have no other ill effects (this is
1508 consistent with behavior in the transition to the DNS when some hosts
1509 could not yet handle some forms of names or record types).
1510
1511 7.2. Changes in Character Interpretations
1512
1513 As a consequence of the elimination of mapping, the current version
1514 of IDNA changes the interpretation of a few characters relative to
1515 its predecessors. This subsection outlines the issues and discusses
1516 possible transition strategies.
1517
1518 7.2.1. Character Changes: Eszett and Final Sigma
1519
1520 In those scripts that make case distinctions, there are a few
1521 characters for which an obvious and unique uppercase character has
1522 not historically been available to match a lowercase one, or vice
1523 versa. For those characters, the mappings used in constructing the
1524 Stringprep tables for IDNA2003, performed using the Unicode
1525 toCaseFold operation (see Section 5.18 of the Unicode Standard
1526 [Unicode52]), generate different characters or sets of characters.
1527 Those operations are not reversible and lose even more information
1528 than traditional uppercase or lowercase transformations, but are more
1529 useful than those transformations for comparison purposes. Two
1530 notable characters of this type are the German character Eszett
1531 (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The
1532 former is case folded to the ASCII string "ss", the latter to a
1533 medial (lowercase) Sigma (U+03C3).
1534
1535
1536
1537 Klensin Informational [Page 28]
1538 RFC 5894 IDNA Rationale August 2010
1539
1540
1541 7.2.2. Character Changes: Zero Width Joiner and Zero Width Non-Joiner
1542
1543 IDNA2003 mapped both ZERO WIDTH JOINER (ZWJ, U+200D) and ZERO WIDTH
1544 NON-JOINER (ZWNJ, U+200C) to nothing, effectively dropping these
1545 characters from any label in which they appeared and treating strings
1546 containing them as identical to strings that did not. As discussed
1547 in Section 3.1.2 above, those characters are essential for writing
1548 many reasonable mnemonics for certain scripts. However, treating
1549 them as valid in IDNA2008, even with contextual restrictions, raises
1550 approximately the same problem as exists with Eszett and Final Sigma:
1551 strings that were valid under IDNA2003 have different interpretations
1552 as labels, and different A-labels, than the same strings under this
1553 newer version.
1554
1555 7.2.3. Character Changes and the Need for Transition
1556
1557 The decision to eliminate mandatory and standardized mappings,
1558 including case folding, from the IDNA2008 protocol in order to make
1559 A-labels and U-labels idempotent made these characters problematic.
1560 If they were to be disallowed, important words and mnemonics could
1561 not be written in orthographically reasonable ways. If they were to
1562 be permitted as distinct characters, there would be no information
1563 loss and registries would have more flexibility, but IDNA2003 and
1564 IDNA2008 lookups might result in different A-labels.
1565
1566 With the understanding that there would be incompatibility either way
1567 but a judgment that the incompatibility was not significant enough to
1568 justify a prefix change, the Working Group concluded that Eszett and
1569 Final Form Sigma should be treated as distinct and Protocol-Valid
1570 characters.
1571
1572 Since these characters are interpreted in different ways under the
1573 older and newer versions of IDNA, transition strategies and policies
1574 will be necessary. Some actions can reasonably be taken by
1575 applications' client programs (those that perform lookup operations
1576 or cause them to be performed), but because of the diversity of
1577 situations and uses of the DNS, much of the responsibility will need
1578 to fall on registries.
1579
1580 Registries, especially those maintaining zones for third parties,
1581 must decide how to introduce a new service in a way that does not
1582 create confusion or significantly weaken or invalidate existing
1583 identifiers. This is not a new problem; registries were faced with
1584 similar issues when IDNs were introduced (potentially, and especially
1585 for Latin-based scripts, in conflict with existing labels that had
1586 been rendered in ASCII characters by applying more or less
1587 standardized conventions) and when other new forms of strings have
1588 been permitted as labels.
1589
1590
1591
1592 Klensin Informational [Page 29]
1593 RFC 5894 IDNA Rationale August 2010
1594
1595
1596 7.2.4. Transition Strategies
1597
1598 There are several approaches to the introduction of new characters or
1599 changes in interpretation of existing characters from their mapped
1600 forms in the earlier version of IDNA. The transition issue is
1601 complicated because the forms of these labels after the
1602 ToUnicode(ToASCII()) translation in IDNA2003 not only remain valid
1603 but do not provide strong indications of what the registrant
1604 intended: a string containing "ss" could have simply been intended to
1605 be that string or could have been intended to contain an Eszett; a
1606 string containing lowercase Sigma could have been intended to contain
1607 Final Sigma (one might make heuristic guesses based on position in a
1608 string, but the long tradition of forming labels by concatenating
1609 words makes such heuristics unreliable), and strings that do not
1610 contain ZWJ or ZWNJ might have been intended to contain them.
1611 Without any preference or claim to completeness, some of these, all
1612 of which have been used by registries in the past for similar
1613 transitions, are:
1614
1615 1. Do not permit use of the newly available character at the
1616 registry level. This might cause lookup failures if a domain
1617 name were to be written with the expectation of the IDNA2003
1618 mapping behavior, but would eliminate any possibility of false
1619 matches.
1620
1621 2. Hold a "sunrise"-like arrangement in which holders of labels
1622 containing "ss" in the Eszett case, lowercase Sigma in that case,
1623 or that might have contained ZWJ or ZWNJ in context, are given
1624 priority (and perhaps other benefits) for registering the
1625 corresponding string containing Eszett, Final Sigma, or the
1626 appropriate zero-width character respectively.
1627
1628 3. Adopt some sort of "variant" approach in which registrants obtain
1629 labels with both character forms.
1630
1631 4. Adopt a different form of "variant" approach in which
1632 registration of additional strings that would produce the same
1633 A-label if interpreted according to IDNA2003 is either not
1634 permitted at all or permitted only by the registrant who already
1635 has one of the names.
1636
1637 5. Ignore the issue and assume that the marketplace or other
1638 mechanisms will sort things out.
1639
1640 In any event, a registry (at any level of the DNS tree) that chooses
1641 to permit labels to be registered that contains these characters, or
1642 considers doing so, will have to address the relationship with
1643 existing, possibly conflicting, labels in some way, just as
1644
1645
1646
1647 Klensin Informational [Page 30]
1648 RFC 5894 IDNA Rationale August 2010
1649
1650
1651 registries that already had a considerable number of labels did when
1652 IDNs were first introduced.
1653
1654 7.3. Elimination of Character Mapping
1655
1656 As discussed at length in Section 6, IDNA2003, via Nameprep (see
1657 Section 7.5), mapped many characters into related ones. Those
1658 mappings no longer exist as requirements in IDNA2008. These
1659 specifications strongly prefer that only A-labels or U-labels be used
1660 in protocol contexts and as much as practical more generally.
1661 IDNA2008 does anticipate situations in which some mapping at the time
1662 of user input into lookup applications is appropriate and desirable.
1663 The issues are discussed in Section 6 and specific recommendations
1664 are made in the Mapping document [IDNA2008-Mapping].
1665
1666 7.4. The Question of Prefix Changes
1667
1668 The conditions that would have required a change in the IDNA ACE
1669 prefix ("xn--", used in IDNA2003) were of great concern to the
1670 community. A prefix change would have clearly been necessary if the
1671 algorithms were modified in a manner that would have created serious
1672 ambiguities during subsequent transition in registrations. This
1673 section summarizes the working group's conclusions about the
1674 conditions under which a change in the prefix would have been
1675 necessary and the implications of such a change.
1676
1677 7.4.1. Conditions Requiring a Prefix Change
1678
1679 An IDN prefix change would have been needed if a given string would
1680 be looked up or otherwise interpreted differently depending on the
1681 version of the protocol or tables being used. This IDNA upgrade
1682 would have required a prefix change if, and only if, one of the
1683 following four conditions were met:
1684
1685 1. The conversion of an A-label to Unicode (i.e., a U-label) would
1686 have yielded one string under IDNA2003 and a different string
1687 under IDNA2008.
1688
1689 2. In a significant number of cases, an input string that was valid
1690 under IDNA2003 and also valid under IDNA2008 would have yielded
1691 two different A-labels with the different versions. This
1692 condition is believed to be essentially equivalent to the one
1693 above except for a very small number of edge cases that were not
1694 found to justify a prefix change (see Section 7.2).
1695
1696 Note that if the input string was valid under one version and not
1697 valid under the other, this condition would not apply. See the
1698 first item in Section 7.4.2, below.
1699
1700
1701
1702 Klensin Informational [Page 31]
1703 RFC 5894 IDNA Rationale August 2010
1704
1705
1706 3. A fundamental change was made to the semantics of the string that
1707 would be inserted in the DNS, e.g., if a decision were made to
1708 try to include language or script information in the encoding in
1709 addition to the string itself.
1710
1711 4. A sufficiently large number of characters were added to Unicode
1712 so that the Punycode mechanism for block offsets would no longer
1713 reference the higher-numbered planes and blocks. This condition
1714 is unlikely even in the long term and certain not to arise in the
1715 next several years.
1716
1717 7.4.2. Conditions Not Requiring a Prefix Change
1718
1719 As a result of the principles described above, none of the following
1720 changes required a new prefix:
1721
1722 1. Prohibition of some characters as input to IDNA. Such a
1723 prohibition might make names that were previously registered
1724 inaccessible, but did not change those names.
1725
1726 2. Adjustments in IDNA tables or actions, including normalization
1727 definitions, that affected characters that were already invalid
1728 under IDNA2003.
1729
1730 3. Changes in the style of the IDNA definition that did not alter
1731 the actions performed by IDNA.
1732
1733 7.4.3. Implications of Prefix Changes
1734
1735 While it might have been possible to make a prefix change, the costs
1736 of such a change are considerable. Registries could not have
1737 converted all IDNA2003 ("xn--") registrations to a new form at the
1738 same time and synchronize that change with applications supporting
1739 lookup. Unless all existing registrations were simply to be declared
1740 invalid (and perhaps even then), systems that needed to support both
1741 labels with old prefixes and labels with new ones would be required
1742 to first process a putative label under the IDNA2008 rules and try to
1743 look it up and then, if it were not found, would be required to
1744 process the label under IDNA2003 rules and look it up again. That
1745 process would probably have significantly slowed down all processing
1746 that involved IDNs in the DNS, especially since a fully-qualified
1747 name might contain a mixture of labels that were registered with the
1748 old and new prefixes. That would have made DNS caching very
1749 difficult. In addition, looking up the same input string as two
1750 separate A-labels would have created some potential for confusion and
1751 attacks, since the labels could map to different targets and then
1752 resolve to different entries in the DNS.
1753
1754
1755
1756
1757 Klensin Informational [Page 32]
1758 RFC 5894 IDNA Rationale August 2010
1759
1760
1761 Consequently, a prefix change should have been, and was, avoided if
1762 at all possible, even if it means accepting some IDNA2003 decisions
1763 about character distinctions as irreversible and/or giving special
1764 treatment to edge cases.
1765
1766 7.5. Stringprep Changes and Compatibility
1767
1768 The Nameprep specification [RFC3491], a key part of IDNA2003, is a
1769 profile of Stringprep [RFC3454]. While Nameprep is a Stringprep
1770 profile specific to IDNA, Stringprep is used by a number of other
1771 protocols. Were Stringprep to have been modified by IDNA2008, those
1772 changes to improve the handling of IDNs could cause problems for
1773 non-DNS uses, most notably if they affected identification and
1774 authentication protocols. Several elements of IDNA2008 give
1775 interpretations to strings prohibited under IDNA2003 or prohibit
1776 strings that IDNA2003 permitted. Those elements include the new
1777 inclusion information in the Tables document [RFC5892], the reduction
1778 in the number of characters permitted as input for registration or
1779 lookup (Section 3), and even the changes in handling of right-to-left
1780 strings as described in the Bidi document [RFC5893]. IDNA2008 does
1781 not use Nameprep or Stringprep at all, so there are no side-effect
1782 changes to other protocols.
1783
1784 It is particularly important to keep IDNA processing separate from
1785 processing for various security protocols because some of the
1786 constraints that are necessary for smooth and comprehensible use of
1787 IDNs may be unwanted or undesirable in other contexts. For example,
1788 the criteria for good passwords or passphrases are very different
1789 from those for desirable IDNs: passwords should be hard to guess,
1790 while domain names should normally be easily memorable. Similarly,
1791 internationalized Small Computer System Interface (SCSI) identifiers
1792 and other protocol components are likely to have different
1793 requirements than IDNs.
1794
1795 7.6. The Symbol Question
1796
1797 One of the major differences between this specification and the
1798 original version of IDNA is that IDNA2003 permitted non-letter
1799 symbols of various sorts, including punctuation and line-drawing
1800 symbols, in the protocol. They were always discouraged in practice.
1801 In particular, both the "IESG Statement" about IDNA and all versions
1802 of the ICANN Guidelines specify that only language characters be used
1803 in labels. This specification disallows symbols entirely. There are
1804 several reasons for this, which include:
1805
1806 1. As discussed elsewhere, the original IDNA specification assumed
1807 that as many Unicode characters as possible should be permitted,
1808 directly or via mapping to other characters, in IDNs. This
1809
1810
1811
1812 Klensin Informational [Page 33]
1813 RFC 5894 IDNA Rationale August 2010
1814
1815
1816 specification operates on an inclusion model, extrapolating from
1817 the original "hostname" rules (LDH, see the Definitions document
1818 [RFC5890]) -- which have served the Internet very well -- to a
1819 Unicode base rather than an ASCII base.
1820
1821 2. Symbol names are more problematic than letters because there may
1822 be no general agreement on whether a particular glyph matches a
1823 symbol; there are no uniform conventions for naming; variations
1824 such as outline, solid, and shaded forms may or may not exist;
1825 and so on. As just one example, consider a "heart" symbol as it
1826 might appear in a logo that might be read as "I love...". While
1827 the user might read such a logo as "I love..." or "I heart...",
1828 considerable knowledge of the coding distinctions made in Unicode
1829 is needed to know that there is more than one "heart" character
1830 (e.g., U+2665, U+2661, and U+2765) and how to describe it. These
1831 issues are of particular importance if strings are expected to be
1832 understood or transcribed by the listener after being read out
1833 loud.
1834
1835 3. Design of a screen reader used by blind Internet users who must
1836 listen to renderings of IDN domain names and possibly reproduce
1837 them on the keyboard becomes considerably more complicated when
1838 the names of characters are not obvious and intuitive to anyone
1839 familiar with the language in question.
1840
1841 4. As a simplified example of this, assume one wanted to use a
1842 "heart" or "star" symbol in a label. This is problematic because
1843 those names are ambiguous in the Unicode system of naming (the
1844 actual Unicode names require far more qualification). A user or
1845 would-be registrant has no way to know -- absent careful study of
1846 the code tables -- whether it is ambiguous (e.g., where there are
1847 multiple "heart" characters) or not. Conversely, the user seeing
1848 the hypothetical label doesn't know whether to read it -- try to
1849 transmit it to a colleague by voice -- as "heart", as "love", as
1850 "black heart", or as any of the other examples below.
1851
1852 5. The actual situation is even worse than this. There is no
1853 possible way for a normal, casual, user to tell the difference
1854 between the hearts of U+2665 and U+2765 and the stars of U+2606
1855 and U+2729 without somehow knowing to look for a distinction. We
1856 have a white heart (U+2661) and few black hearts. Consequently,
1857 describing a label as containing a heart is hopelessly ambiguous:
1858 we can only know that it contains one of several characters that
1859 look like hearts or have "heart" in their names. In cities where
1860 "Square" is a popular part of a location name, one might well
1861 want to use a square symbol in a label as well and there are far
1862 more squares of various flavors in Unicode than there are hearts
1863 or stars.
1864
1865
1866
1867 Klensin Informational [Page 34]
1868 RFC 5894 IDNA Rationale August 2010
1869
1870
1871 The consequence of these ambiguities is that symbols are a very poor
1872 basis for reliable communication. Consistent with this conclusion,
1873 the Unicode standard recommends that strings used in identifiers not
1874 contain symbols or punctuation [Unicode-UAX31]. Of course, these
1875 difficulties with symbols do not arise with actual pictographic
1876 languages and scripts which would be treated like any other language
1877 characters; the two should not be confused.
1878
1879 7.7. Migration between Unicode Versions: Unassigned Code Points
1880
1881 In IDNA2003, labels containing unassigned code points are looked up
1882 on the assumption that, if they appear in labels and can be mapped
1883 and then resolved, the relevant standards must have changed and the
1884 registry has properly allocated only assigned values.
1885
1886 In the IDNA2008 protocol, strings containing unassigned code points
1887 must not be either looked up or registered. In summary, the status
1888 of an unassigned character with regard to the DISALLOWED,
1889 PROTOCOL-VALID, and CONTEXTUAL RULE REQUIRED categories cannot be
1890 evaluated until a character is actually assigned and known. There
1891 are several reasons for this, with the most important ones being:
1892
1893 o Tests involving the context of characters (e.g., some characters
1894 being permitted only adjacent to others of specific types) and
1895 integrity tests on complete labels are needed. Unassigned code
1896 points cannot be permitted because one cannot determine whether
1897 particular code points will require contextual rules (and what
1898 those rules should be) before characters are assigned to them and
1899 the properties of those characters fully understood.
1900
1901 o It cannot be known in advance, and with sufficient reliability,
1902 whether a newly assigned code point will be associated with a
1903 character that would be disallowed by the rules in the Tables
1904 document [RFC5892] (such as a compatibility character). In
1905 IDNA2003, since there is no direct dependency on NFKC (many of the
1906 entries in Stringprep's tables are based on NFKC, but IDNA2003
1907 depends only on Stringprep), allocation of a compatibility
1908 character might produce some odd situations, but it would not be a
1909 problem. In IDNA2008, where compatibility characters are
1910 DISALLOWED unless character-specific exceptions are made,
1911 permitting strings containing unassigned characters to be looked
1912 up would violate the principle that characters in DISALLOWED are
1913 not looked up.
1914
1915 o The Unicode Standard specifies that an unassigned code point
1916 normalizes (and, where relevant, case folds) to itself. If the
1917 code point is later assigned to a character, and particularly if
1918 the newly assigned code point has a combining class that
1919
1920
1921
1922 Klensin Informational [Page 35]
1923 RFC 5894 IDNA Rationale August 2010
1924
1925
1926 determines its placement relative to other combining characters,
1927 it could normalize to some other code point or sequence.
1928
1929 It is possible to argue that the issues above are not important and
1930 that, as a consequence, it is better to retain the principle of
1931 looking up labels even if they contain unassigned characters because
1932 all of the important scripts and characters have been coded as of
1933 Unicode 5.2 (or even earlier), and hence unassigned code points will
1934 be assigned only to obscure characters or archaic scripts.
1935 Unfortunately, that does not appear to be a safe assumption for at
1936 least two reasons. First, much the same claim of completeness has
1937 been made for earlier versions of Unicode. The reality is that a
1938 script that is obscure to much of the world may still be very
1939 important to those who use it. Cultural and linguistic preservation
1940 principles make it inappropriate to declare the script of no
1941 importance in IDNs. Second, we already have counterexamples, e.g.,
1942 in the relationships associated with new Han characters being added
1943 (whether in the BMP or in Unicode Plane 2).
1944
1945 Independent of the technical transition issues identified above, it
1946 can be observed that any addition of characters to an existing script
1947 to make it easier to use or to better accommodate particular
1948 languages may lead to transition issues. Such additions may change
1949 the preferred form for writing a particular string, changes that may
1950 be reflected, e.g., in keyboard transition modules that would
1951 necessarily be different from those for earlier versions of Unicode
1952 where the newer characters may not exist. This creates an inherent
1953 transition problem because attempts to access labels may use either
1954 the old or the new conventions, requiring registry action whether or
1955 not the older conventions were used in labels. The need to consider
1956 transition mechanisms is inherent to evolution of Unicode to better
1957 accommodate writing systems and is independent of how IDNs are
1958 represented in the DNS or how transitions among versions of those
1959 mechanisms occur. The requirement for transitions of this type is
1960 illustrated by the addition of Malayalam Chillu in Unicode 5.1.0.
1961
1962 7.8. Other Compatibility Issues
1963
1964 The 2003 IDNA model includes several odd artifacts of the context in
1965 which it was developed. Many, if not all, of these are potential
1966 avenues for exploits, especially if the registration process permits
1967 "source" names (names that have not been processed through IDNA and
1968 Nameprep) to be registered. As one example, since the character
1969 Eszett, used in German, is mapped by IDNA2003 into the sequence "ss"
1970 rather than being retained as itself or prohibited, a string
1971 containing that character, but that is otherwise in ASCII, is not
1972 really an IDN (in the U-label sense defined above). After Nameprep
1973 maps out the Eszett, the result is an ASCII string and so it does not
1974
1975
1976
1977 Klensin Informational [Page 36]
1978 RFC 5894 IDNA Rationale August 2010
1979
1980
1981 get an xn-- prefix, but the string that can be displayed to a user
1982 appears to be an IDN. IDNA2008 eliminates this artifact. A
1983 character is either permitted as itself or it is prohibited; special
1984 cases that make sense only in a particular linguistic or cultural
1985 context can be dealt with as localization matters where appropriate.
1986
1987 8. Name Server Considerations
1988
1989 8.1. Processing Non-ASCII Strings
1990
1991 Existing DNS servers do not know the IDNA rules for handling
1992 non-ASCII forms of IDNs, and therefore need to be shielded from them.
1993 All existing channels through which names can enter a DNS server
1994 database (for example, master files (as described in RFC 1034) and
1995 DNS update messages [RFC2136]) could not be IDNA-aware because they
1996 predate IDNA. Other sections of this document provide the needed
1997 shielding by ensuring that internationalized domain names entering
1998 DNS server databases through such channels have already been
1999 converted to their equivalent ASCII A-label forms.
2000
2001 Because of the distinction made between the algorithms for
2002 Registration and Lookup in Sections 4 and 5 (respectively) of the
2003 Protocol document [RFC5891] (a domain name containing only ASCII code
2004 points cannot be converted to an A-label), there cannot be more than
2005 one A-label form for any given U-label.
2006
2007 As specified in clarifications to the DNS specification [RFC2181],
2008 the DNS protocol explicitly allows domain labels to contain octets
2009 beyond the ASCII range (0000..007F), and this document does not
2010 change that. However, although the interpretation of octets
2011 0080..00FF is well-defined in the DNS, many application protocols
2012 support only ASCII labels and there is no defined interpretation of
2013 these non-ASCII octets as characters and, in particular, no
2014 interpretation of case-independent matching for them (e.g., see the
2015 clarification on DNS case insensitivity [RFC4343]). If labels
2016 containing these octets are returned to applications, unpredictable
2017 behavior could result. The A-label form, which cannot contain those
2018 characters, is the only standard representation for internationalized
2019 labels in the DNS protocol.
2020
2021 8.2. Root and Other DNS Server Considerations
2022
2023 IDNs in A-label form will generally be somewhat longer than current
2024 domain names, so the bandwidth needed by the root servers is likely
2025 to go up by a small amount. Also, queries and responses for IDNs
2026 will probably be somewhat longer than typical queries historically,
2027
2028
2029
2030
2031
2032 Klensin Informational [Page 37]
2033 RFC 5894 IDNA Rationale August 2010
2034
2035
2036 so Extension Mechanisms for DNS (EDNS0) [RFC2671] support may be more
2037 important (otherwise, queries and responses may be forced to go to
2038 TCP instead of UDP).
2039
2040 9. Internationalization Considerations
2041
2042 DNS labels and fully-qualified domain names provide mnemonics that
2043 assist in identifying and referring to resources on the Internet.
2044 IDNs expand the range of those mnemonics to include those based on
2045 languages and character sets other than Western European and Roman-
2046 derived ones. But domain "names" are not, in general, words in any
2047 language. The recommendations of the IETF policy on character sets
2048 and languages (BCP 18 [RFC2277]) are applicable to situations in
2049 which language identification is used to provide language-specific
2050 contexts. The DNS is, by contrast, global and international and
2051 ultimately has nothing to do with languages. Adding languages (or
2052 similar context) to IDNs generally, or to DNS matching in particular,
2053 would imply context-dependent matching in DNS, which would be a very
2054 significant change to the DNS protocol itself. It would also imply
2055 that users would need to identify the language associated with a
2056 particular label in order to look that label up. That knowledge is
2057 generally not available because many labels are not words in any
2058 language and some may be words in more than one.
2059
2060 10. IANA Considerations
2061
2062 This section gives an overview of IANA registries required for IDNA.
2063 The actual definitions of, and specifications for, the first two,
2064 which have been newly created for IDNA2008, appear in the Tables
2065 document [RFC5892]. This document describes the registries, but it
2066 does not specify any IANA actions.
2067
2068 10.1. IDNA Character Registry
2069
2070 The distinction among the major categories "UNASSIGNED",
2071 "DISALLOWED", "PROTOCOL-VALID", and "CONTEXTUAL RULE REQUIRED" is
2072 made by special categories and rules that are integral elements of
2073 the Tables document. While not normative, an IANA registry of
2074 characters and scripts and their categories, updated for each new
2075 version of Unicode and the characters it contains, are convenient for
2076 programming and validation purposes. The details of this registry
2077 are specified in the Tables document.
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087 Klensin Informational [Page 38]
2088 RFC 5894 IDNA Rationale August 2010
2089
2090
2091 10.2. IDNA Context Registry
2092
2093 IANA has created and now maintains a list of approved contextual
2094 rules for characters that are defined in the IDNA Character Registry
2095 list as requiring a Contextual Rule (i.e., the types of rules
2096 described in Section 3.1.2). The details for those rules appear in
2097 the Tables document.
2098
2099 10.3. IANA Repository of IDN Practices of TLDs
2100
2101 This registry, historically described as the "IANA Language Character
2102 Set Registry" or "IANA Script Registry" (both somewhat misleading
2103 terms), is maintained by IANA at the request of ICANN. It is used to
2104 provide a central documentation repository of the IDN policies used
2105 by top level domain (TLD) registries who volunteer to contribute to
2106 it and is used in conjunction with ICANN Guidelines for IDN use.
2107
2108 It is not an IETF-managed registry and, while the protocol changes
2109 specified here may call for some revisions to the tables, IDNA2008
2110 has no direct effect on that registry and no IANA action is required
2111 as a result.
2112
2113 11. Security Considerations
2114
2115 11.1. General Security Issues with IDNA
2116
2117 This document is purely explanatory and informational and
2118 consequently introduces no new security issues. It would, of course,
2119 be a poor idea for someone to try to implement from it; such an
2120 attempt would almost certainly lead to interoperability problems and
2121 might lead to security ones. A discussion of security issues with
2122 IDNA, including some relevant history, appears in the Definitions
2123 document [RFC5890].
2124
2125 12. Acknowledgments
2126
2127 The editor and contributors would like to express their thanks to
2128 those who contributed significant early (pre-working group) review
2129 comments, sometimes accompanied by text, Paul Hoffman, Simon
2130 Josefsson, and Sam Weiler. In addition, some specific ideas were
2131 incorporated from suggestions, text, or comments about sections that
2132 were unclear supplied by Vint Cerf, Frank Ellerman, Michael Everson,
2133 Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler.
2134 Thanks are also due to Vint Cerf, Lisa Dusseault, Debbie Garside, and
2135 Jefsey Morfin for conversations that led to considerable improvements
2136 in the content of this document and to several others, including Ben
2137
2138
2139
2140
2141
2142 Klensin Informational [Page 39]
2143 RFC 5894 IDNA Rationale August 2010
2144
2145
2146 Campbell, Martin Duerst, Subramanian Moonesamy, Peter Saint-Andre,
2147 and Dan Winship, for catching specific errors and recommending
2148 corrections.
2149
2150 A meeting was held on 30 January 2008 to attempt to reconcile
2151 differences in perspective and terminology about this set of
2152 specifications between the design team and members of the Unicode
2153 Technical Consortium. The discussions at and subsequent to that
2154 meeting were very helpful in focusing the issues and in refining the
2155 specifications. The active participants at that meeting were (in
2156 alphabetic order, as usual) Harald Alvestrand, Vint Cerf, Tina Dam,
2157 Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary
2158 Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel,
2159 Michel Suignard, and Ken Whistler. We express our thanks to Google
2160 for support of that meeting and to the participants for their
2161 contributions.
2162
2163 Useful comments and text on the working group versions of the working
2164 draft were received from many participants in the IETF "IDNABIS"
2165 working group and a number of document changes resulted from mailing
2166 list discussions made by that group. Marcos Sanz provided specific
2167 analysis and suggestions that were exceptionally helpful in refining
2168 the text, as did Vint Cerf, Martin Duerst, Andrew Sullivan, and Ken
2169 Whistler. Lisa Dusseault provided extensive editorial suggestions
2170 during the spring of 2009, most of which were incorporated.
2171
2172 13. Contributors
2173
2174 While the listed editor held the pen, the core of this document and
2175 the initial working group version represents the joint work and
2176 conclusions of an ad hoc design team consisting of the editor and, in
2177 alphabetic order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and
2178 Cary Karp. Considerable material describing mapping principles has
2179 been incorporated from a draft of the Mapping document
2180 [IDNA2008-Mapping] by Pete Resnick and Paul Hoffman. In addition,
2181 there were many specific contributions and helpful comments from
2182 those listed in the Acknowledgments section and others who have
2183 contributed to the development and use of the IDNA protocols.
2184
2185 14. References
2186
2187 14.1. Normative References
2188
2189 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
2190 "Internationalizing Domain Names in Applications
2191 (IDNA)", RFC 3490, March 2003.
2192
2193
2194
2195
2196
2197 Klensin Informational [Page 40]
2198 RFC 5894 IDNA Rationale August 2010
2199
2200
2201 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
2202 Unicode for Internationalized Domain Names in
2203 Applications (IDNA)", RFC 3492, March 2003.
2204
2205 [RFC5890] Klensin, J., "Internationalized Domain Names for
2206 Applications (IDNA): Definitions and Document
2207 Framework", RFC 5890, August 2010.
2208
2209 [RFC5891] Klensin, J., "Internationalized Domain Names in
2210 Applications (IDNA): Protocol", RFC 5891, August 2010.
2211
2212 [RFC5892] Faltstrom, P., "The Unicode Code Points and
2213 Internationalized Domain Names for Applications (IDNA)",
2214 RFC 5892, August 2010.
2215
2216 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
2217 Internationalized Domain Names for Applications (IDNA)",
2218 RFC 5893, August 2010.
2219
2220 [Unicode52] The Unicode Consortium. The Unicode Standard, Version
2221 5.2.0, defined by: "The Unicode Standard, Version
2222 5.2.0", (Mountain View, CA: The Unicode Consortium,
2223 2009. ISBN 978-1-936213-00-9).
2224 <http://www.unicode.org/versions/Unicode5.2.0/>.
2225
2226 14.2. Informative References
2227
2228 [IDNA2008-Mapping]
2229 Resnick, P. and P. Hoffman, "Mapping Characters in
2230 Internationalized Domain Names for Applications (IDNA)",
2231 Work in Progress, April 2010.
2232
2233 [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD
2234 Internet host table specification", RFC 952,
2235 October 1985.
2236
2237 [RFC1034] Mockapetris, P., "Domain names - concepts and
2238 facilities", STD 13, RFC 1034, November 1987.
2239
2240 [RFC1035] Mockapetris, P., "Domain names - implementation and
2241 specification", STD 13, RFC 1035, November 1987.
2242
2243 [RFC1123] Braden, R., "Requirements for Internet Hosts -
2244 Application and Support", STD 3, RFC 1123, October 1989.
2245
2246 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
2247 "Dynamic Updates in the Domain Name System (DNS
2248 UPDATE)", RFC 2136, April 1997.
2249
2250
2251
2252 Klensin Informational [Page 41]
2253 RFC 5894 IDNA Rationale August 2010
2254
2255
2256 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
2257 Specification", RFC 2181, July 1997.
2258
2259 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
2260 Languages", BCP 18, RFC 2277, January 1998.
2261
2262 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
2263 RFC 2671, August 1999.
2264
2265 [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
2266 specifying the location of services (DNS SRV)",
2267 RFC 2782, February 2000.
2268
2269 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
2270 Internationalized Strings ("stringprep")", RFC 3454,
2271 December 2002.
2272
2273 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
2274 Profile for Internationalized Domain Names (IDN)",
2275 RFC 3491, March 2003.
2276
2277 [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
2278 Engineering Team (JET) Guidelines for Internationalized
2279 Domain Names (IDN) Registration and Administration for
2280 Chinese, Japanese, and Korean", RFC 3743, April 2004.
2281
2282 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
2283 Identifiers (IRIs)", RFC 3987, January 2005.
2284
2285 [RFC4290] Klensin, J., "Suggested Practices for Registration of
2286 Internationalized Domain Names (IDN)", RFC 4290,
2287 December 2005.
2288
2289 [RFC4343] Eastlake, D., "Domain Name System (DNS) Case
2290 Insensitivity Clarification", RFC 4343, January 2006.
2291
2292 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
2293 and Recommendations for Internationalized Domain Names
2294 (IDNs)", RFC 4690, September 2006.
2295
2296 [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin,
2297 "Registration and Administration Recommendations for
2298 Chinese Domain Names", RFC 4713, October 2006.
2299
2300
2301
2302
2303
2304
2305
2306
2307 Klensin Informational [Page 42]
2308 RFC 5894 IDNA Rationale August 2010
2309
2310
2311 [Unicode-UAX31]
2312 The Unicode Consortium, "Unicode Standard Annex #31:
2313 Unicode Identifier and Pattern Syntax, Revision 11",
2314 September 2009,
2315 <http://www.unicode.org/reports/tr31/tr31-11.html>.
2316
2317 [Unicode-UTS39]
2318 The Unicode Consortium, "Unicode Technical Standard #39:
2319 Unicode Security Mechanisms, Revision 2", August 2006,
2320 <http://www.unicode.org/reports/tr39/tr39-2.html>.
2321
2322 Author's Address
2323
2324 John C Klensin
2325 1770 Massachusetts Ave, Ste 322
2326 Cambridge, MA 02140
2327 USA
2328
2329 Phone: +1 617 245 1457
2330 EMail: john+ietf@jck.com
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362 Klensin Informational [Page 43]
2363
The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.
This RFC is included in the DNS RFCs annotation project whose home page is here.