1 Internet Engineering Task Force (IETF) J. Klensin 2 Request for Comments: 5894 August 2010 3 Category: Informational 4 ISSN: 2070-1721 5 6 7 Internationalized Domain Names for Applications (IDNA): 8 Background, Explanation, and Rationale 9 10 Abstract 11 12 Several years have passed since the original protocol for 13 Internationalized Domain Names (IDNs) was completed and deployed. 14 During that time, a number of issues have arisen, including the need 15 to update the system to deal with newer versions of Unicode. Some of 16 these issues require tuning of the existing protocols and the tables 17 on which they depend. This document provides an overview of a 18 revised system and provides explanatory material for its components. 19 20 Status of This Memo 21 22 This document is not an Internet Standards Track specification; it is 23 published for informational purposes. 24 25 This document is a product of the Internet Engineering Task Force 26 (IETF). It represents the consensus of the IETF community. It has 27 received public review and has been approved for publication by the 28 Internet Engineering Steering Group (IESG). Not all documents 29 approved by the IESG are a candidate for any level of Internet 30 Standard; see Section 2 of RFC 5741. 31 32 Information about the current status of this document, any errata, 33 and how to provide feedback on it may be obtained at 34 http://www.rfc-editor.org/info/rfc5894. 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Klensin Informational [Page 1] 53 RFC 5894 IDNA Rationale August 2010 54 55 56 Copyright Notice 57 58 Copyright (c) 2010 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 60 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 70 71 This document may contain material from IETF Documents or IETF 72 Contributions published or made publicly available before November 73 10, 2008. The person(s) controlling the copyright in some of this 74 material may not have granted the IETF Trust the right to allow 75 modifications of such material outside the IETF Standards Process. 76 Without obtaining an adequate license from the person(s) controlling 77 the copyright in such materials, this document may not be modified 78 outside the IETF Standards Process, and derivative works of it may 79 not be created outside the IETF Standards Process, except to format 80 it for publication as an RFC or to translate it into languages other 81 than English. 82 83 Table of Contents 84 85 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 86 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 87 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 88 1.2.1. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 89 1.2.2. New Terminology and Restrictions . . . . . . . . . . . 6 90 1.3. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 91 1.4. Applicability and Function of IDNA . . . . . . . . . . . . 7 92 1.5. Comprehensibility of IDNA Mechanisms and Processing . . . 8 93 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 94 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 95 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 96 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 97 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 98 220.127.116.11. Contextual Restrictions . . . . . . . . . . . . . 11 99 18.104.22.168. Rules and Their Application . . . . . . . . . . . 12 100 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 101 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 102 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14 103 104 105 106 107 Klensin Informational [Page 2] 108 RFC 5894 IDNA Rationale August 2010 109 110 111 3.3. Layered Restrictions: Tables, Context, Registration, and 112 Applications . . . . . . . . . . . . . . . . . . . . . . . 15 113 4. Application-Related Issues . . . . . . . . . . . . . . . . . . 15 114 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 115 4.2. Entry and Display in Applications . . . . . . . . . . . . 16 116 4.3. Linguistic Expectations: Ligatures, Digraphs, and 117 Alternate Character Forms . . . . . . . . . . . . . . . . 19 118 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20 119 4.5. Right-to-Left Text . . . . . . . . . . . . . . . . . . . . 21 120 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 22 121 6. Front-end and User Interface Processing for Lookup . . . . . . 22 122 7. Migration from IDNA2003 and Unicode Version Synchronization . 25 123 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 25 124 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 25 125 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26 126 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27 127 7.2. Changes in Character Interpretations . . . . . . . . . . . 28 128 7.2.1. Character Changes: Eszett and Final Sigma . . . . . . 28 129 7.2.2. Character Changes: Zero Width Joiner and Zero 130 Width Non-Joiner . . . . . . . . . . . . . . . . . . . 29 131 7.2.3. Character Changes and the Need for Transition . . . . 29 132 7.2.4. Transition Strategies . . . . . . . . . . . . . . . . 30 133 7.3. Elimination of Character Mapping . . . . . . . . . . . . . 31 134 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31 135 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 31 136 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32 137 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 32 138 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 33 139 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 33 140 7.7. Migration between Unicode Versions: Unassigned Code 141 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 35 142 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 36 143 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 37 144 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 37 145 8.2. Root and Other DNS Server Considerations . . . . . . . . . 37 146 9. Internationalization Considerations . . . . . . . . . . . . . 38 147 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 148 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 38 149 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 39 150 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 39 151 11. Security Considerations . . . . . . . . . . . . . . . . . . . 39 152 11.1. General Security Issues with IDNA . . . . . . . . . . . . 39 153 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 154 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 40 155 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 156 14.1. Normative References . . . . . . . . . . . . . . . . . . . 40 157 14.2. Informative References . . . . . . . . . . . . . . . . . . 41 158 159 160 161 162 Klensin Informational [Page 3] 163 RFC 5894 IDNA Rationale August 2010 164 165 166 1. Introduction 167 168 1.1. Context and Overview 169 170 Internationalized Domain Names in Applications (IDNA) is a collection 171 of standards that allow client applications to convert some mnemonic 172 strings expressed in Unicode to an ASCII-compatible encoding form 173 ("ACE") that is a valid DNS label containing only LDH syntax (see the 174 Definitions document [RFC5890]). The specific form of ACE label used 175 by IDNA is called an "A-label". A client can look up an exact 176 A-label in the existing DNS, so A-labels do not require any 177 extensions to DNS, upgrades of DNS servers, or updates to low-level 178 client libraries. An A-label is recognizable from the prefix "xn--" 179 before the characters produced by the Punycode algorithm [RFC3492]; 180 thus, a user application can identify an A-label and convert it into 181 Unicode (or some local coded character set) for display. 182 183 On the registry side, IDNA allows a registry to offer 184 Internationalized Domain Names (IDNs) for registration as A-labels. 185 A registry may offer any subset of valid IDNs, and may apply any 186 restrictions or bundling (grouping of similar labels together in one 187 registration) appropriate for the context of that registry. 188 Registration of labels is sometimes discussed separately from lookup, 189 and it is subject to a few specific requirements that do not apply to 190 lookup. 191 192 DNS clients and registries are subject to some differences in 193 requirements for handling IDNs. In particular, registries are urged 194 to register only exact, valid A-labels, while clients might do some 195 mapping to get from otherwise-invalid user input to a valid A-label. 196 197 The first version of IDNA was published in 2003 and is referred to 198 here as IDNA2003 to contrast it with the current version, which is 199 known as IDNA2008 (after the year in which IETF work started on it). 200 IDNA2003 consists of four documents: the IDNA base specification 201 [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep 202 [RFC3454]. The current set of documents, IDNA2008, is not dependent 203 on any of the IDNA2003 specifications other than the one for Punycode 204 encoding. References to "IDNA2008", "these specifications", or 205 "these documents" are to the entire IDNA2008 set listed in a separate 206 Definitions document [RFC5890]. The characters that are valid in 207 A-labels are identified from rules listed in the Tables document 208 [RFC5892], but validity can be derived from the Unicode properties of 209 those characters with a very few exceptions. 210 211 Traditionally, DNS labels are matched case-insensitively (as 212 described in the DNS specifications [RFC1034][RFC1035]). That 213 convention was preserved in IDNA2003 by a case-folding operation that 214 215 216 217 Klensin Informational [Page 4] 218 RFC 5894 IDNA Rationale August 2010 219 220 221 generally maps capital letters into lowercase ones. However, if case 222 rules are enforced from one language, another language sometimes 223 loses the ability to treat two characters separately. Case- 224 insensitivity is treated slightly differently in IDNA2008. 225 226 IDNA2003 used Unicode version 3.2 only. In order to keep up with new 227 characters added in new versions of Unicode, IDNA2008 decouples its 228 rules from any particular version of Unicode. Instead, the 229 attributes of new characters in Unicode, supplemented by a small 230 number of exception cases, determine how and whether the characters 231 can be used in IDNA labels. 232 233 This document provides informational context for IDNA2008, including 234 terminology, background, and policy discussions. It contains no 235 normative material; specifications for conformance to the IDNA2008 236 protocols appears entirely in the other documents in the series. 237 238 1.2. Terminology 239 240 Terminology for IDNA2008 appears in the Definitions document 241 [RFC5890]. That document also contains a road map to the IDNA2008 242 document collection. No attempt should be made to understand this 243 document without the definitions and concepts that appear there. 244 245 1.2.1. DNS "Name" Terminology 246 247 In the context of IDNs, the DNS term "name" has introduced some 248 confusion as people speak of DNS labels in terms of the words or 249 phrases of various natural languages. Historically, many of the 250 "names" in the DNS have been mnemonics to identify some particular 251 concept, object, or organization. They are typically rooted in some 252 language because most people think in language-based ways. But, 253 because they are mnemonics, they need not obey the orthographic 254 conventions of any language: it is not a requirement that it be 255 possible for them to be "words". 256 257 This distinction is important because the reasonable goal of an IDN 258 effort is not to be able to write the great Klingon (or language of 259 one's choice) novel in DNS labels but to be able to form a usefully 260 broad range of mnemonics in ways that are as natural as possible in a 261 very broad range of scripts. 262 263 264 265 266 267 268 269 270 271 272 Klensin Informational [Page 5] 273 RFC 5894 IDNA Rationale August 2010 274 275 276 1.2.2. New Terminology and Restrictions 277 278 IDNA2008 introduces new terminology. Precise definitions are 279 provided in the Definitions document for the terms U-label, A-Label, 280 LDH label (to which all valid pre-IDNA hostnames conformed), Reserved 281 LDH label (R-LDH label), XN-label, Fake A-label, and Non-Reserved LDH 282 label (NR-LDH label). 283 284 In addition, the term "putative label" has been adopted to refer to a 285 label that may appear to meet certain definitional constraints but 286 has not yet been sufficiently tested for validity. 287 288 These definitions are also illustrated in Figure 1 of the Definitions 289 document. R-LDH labels contain "--" in the third and fourth 290 character positions from the beginning of the label. In IDNA-aware 291 applications, only a subset of these reserved labels is permitted to 292 be used, namely the A-label subset. A-labels are a subset of the 293 R-LDH labels that begin with the case-insensitive string "xn--". 294 Labels that bear this prefix but that are not otherwise valid fall 295 into the "Fake A-label" category. The Non-Reserved labels (NR-LDH 296 labels) are implicitly valid since they do not bear any resemblance 297 to the labels specified by IDNA. 298 299 The creation of the Reserved-LDH category is required for three 300 reasons: 301 302 o to prevent confusion with pre-IDNA coding forms; 303 304 o to permit future extensions that would require changing the 305 prefix, no matter how unlikely those might be (see Section 7.4); 306 and 307 308 o to reduce the opportunities for attacks via the Punycode encoding 309 algorithm itself. 310 311 As with other documents in the IDNA2008 set, this document uses the 312 term "registry" to describe any zone in the DNS. That term, and the 313 terms "zone" or "zone administration", are interchangeable. 314 315 1.3. Objectives 316 317 These are the main objectives in revising IDNA. 318 319 o Use a more recent version of Unicode and allow IDNA to be 320 independent of Unicode versions, so that IDNA2008 need not be 321 updated for implementations to adopt code points from new Unicode 322 versions. 323 324 325 326 327 Klensin Informational [Page 6] 328 RFC 5894 IDNA Rationale August 2010 329 330 331 o Fix a very small number of code point categorizations that have 332 turned out to cause problems in the communities that use those 333 code points. 334 335 o Reduce the dependency on mapping, in favor of valid A-labels. 336 This will result in pre-mapped forms that are not valid IDNA 337 labels appearing less often in various contexts. 338 339 o Fix some details in the bidirectional code point handling 340 algorithms. 341 342 1.4. Applicability and Function of IDNA 343 344 The IDNA specification solves the problem of extending the repertoire 345 of characters that can be used in domain names to include a large 346 subset of the Unicode repertoire. 347 348 IDNA does not extend DNS. Instead, the applications (and, by 349 implication, the users) continue to see an exact-match lookup 350 service. Either there is a single name that matches exactly (subject 351 to the base DNS requirement of case-insensitive ASCII matching) or 352 there is no match. This model has served the existing applications 353 well, but it requires, with or without internationalized domain 354 names, that users know the exact spelling of the domain names that 355 are to be typed into applications such as web browsers and mail user 356 agents. The introduction of the larger repertoire of characters 357 potentially makes the set of misspellings larger, especially given 358 that in some cases the same appearance, for example on a business 359 card, might visually match several Unicode code points or several 360 sequences of code points. 361 362 The IDNA standard does not require any applications to conform to it, 363 nor does it retroactively change those applications. An application 364 can elect to use IDNA in order to support IDNs while maintaining 365 interoperability with existing infrastructure. For applications that 366 want to use non-ASCII characters in public DNS domain names, IDNA is 367 the only option that is defined at the time this specification is 368 published. Adding IDNA support to an existing application entails 369 changes to the application only, and leaves room for flexibility in 370 front-end processing and more specifically in the user interface (see 371 Section 6). 372 373 A great deal of the discussion of IDN solutions has focused on 374 transition issues and how IDNs will work in a world where not all of 375 the components have been updated. Proposals that were not chosen by 376 the original IDN Working Group would have depended on updating user 377 applications, DNS resolvers, and DNS servers in order for a user to 378 apply an internationalized domain name in any form or coding 379 380 381 382 Klensin Informational [Page 7] 383 RFC 5894 IDNA Rationale August 2010 384 385 386 acceptable under that method. While processing must be performed 387 prior to or after access to the DNS, IDNA requires no changes to the 388 DNS protocol, any DNS servers, or the resolvers on users' computers. 389 390 IDNA allows the graceful introduction of IDNs not only by avoiding 391 upgrades to existing infrastructure (such as DNS servers and mail 392 transport agents), but also by allowing some limited use of IDNs in 393 applications by using the ASCII-encoded representation of the labels 394 containing non-ASCII characters. While such names are user- 395 unfriendly to read and type, and hence not optimal for user input, 396 they can be used as a last resort to allow rudimentary IDN usage. 397 For example, they might be the best choice for display if it were 398 known that relevant fonts were not available on the user's computer. 399 In order to allow user-friendly input and output of the IDNs and 400 acceptance of some characters as equivalent to those to be processed 401 according to the protocol, the applications need to be modified to 402 conform to this specification. 403 404 This version of IDNA uses the Unicode character repertoire for 405 continuity with the original version of IDNA. 406 407 1.5. Comprehensibility of IDNA Mechanisms and Processing 408 409 One goal of IDNA2008, which is aided by the main goal of reducing the 410 dependency on mapping, is to improve the general understanding of how 411 IDNA works and what characters are permitted and what happens to 412 them. Comprehensibility and predictability to users and registrants 413 are important design goals for this effort. End-user applications 414 have an important role to play in increasing this comprehensibility. 415 416 Any system that tries to handle international characters encounters 417 some common problems. For example, a User Interface (UI) cannot 418 display a character if no font containing that character is 419 available. In some cases, internationalization enables effective 420 localization while maintaining some global uniformity but losing some 421 universality. 422 423 It is difficult to even make suggestions as to how end-user 424 applications should cope when characters and fonts are not available. 425 Because display functions are rarely controlled by the types of 426 applications that would call upon IDNA, such suggestions will rarely 427 be very effective. 428 429 Conversion between local character sets and normalized Unicode, if 430 needed, is part of this set of user interface issues. Those 431 conversions introduce complexity in a system that does not use 432 Unicode as its primary (or only) internal character coding system. 433 If a label is converted to a local character set that does not have 434 435 436 437 Klensin Informational [Page 8] 438 RFC 5894 IDNA Rationale August 2010 439 440 441 all the needed characters, or that uses different character-coding 442 principles, the user interface program may have to add special logic 443 to avoid or reduce loss of information. 444 445 The major difficulty may lie in accurately identifying the incoming 446 character set and applying the correct conversion routine. Even more 447 difficult, the local character coding system could be based on 448 conceptually different assumptions than those used by Unicode (e.g., 449 choice of font encodings used for publications in some Indic 450 scripts). Those differences may not easily yield unambiguous 451 conversions or interpretations even if each coding system is 452 internally consistent and adequate to represent the local language 453 and script. 454 455 IDNA2008 shifts responsibility for character mapping and other 456 adjustments from the protocol (where it was located in IDNA2003) to 457 pre-processing before invoking IDNA itself. The intent is that this 458 change will lead to greater usage of fully-valid A-Labels or U-labels 459 in display, transit, and storage, which should aid comprehensibility 460 and predictability. A careful look at pre-processing raises issues 461 about what that pre-processing should do and at what point 462 pre-processing becomes harmful; how universally consistent 463 pre-processing algorithms can be; and how to be compatible with 464 labels prepared in an IDNA2003 context. Those issues are discussed 465 in Section 6 and in the Mapping document [IDNA2008-Mapping]. 466 467 2. Processing in IDNA2008 468 469 IDNA2008 separates Domain Name Registration and Lookup in the 470 protocol specification (RFC 5891, Sections 4 and 5 [RFC5891]). 471 Although most steps in the two processes are similar, the separation 472 reflects current practice in which per-registry (DNS zone) 473 restrictions and special processing are applied at registration time 474 but not during lookup. Another significant benefit is that 475 separation facilitates incremental addition of permitted character 476 groups to avoid freezing on one particular version of Unicode. 477 478 The actual registration and lookup protocols for IDNA2008 are 479 specified in the Protocol document. 480 481 3. Permitted Characters: An Inclusion List 482 483 IDNA2008 adopts the inclusion model. A code point is assumed to be 484 invalid for IDN use unless it is included as part of a Unicode 485 property-based rule or, in rare cases, included individually by an 486 exception. When an implementation moves to a new version of Unicode, 487 the rules may indicate new valid code points. 488 489 490 491 492 Klensin Informational [Page 9] 493 RFC 5894 IDNA Rationale August 2010 494 495 496 This section provides an overview of the model used to establish the 497 algorithm and character lists of the Tables document [RFC5892] and 498 describes the names and applicability of the categories used there. 499 Note that the inclusion of a character in the PROTOCOL-VALID category 500 group (Section 3.1.1) does not imply that it can be used 501 indiscriminately; some characters are associated with contextual 502 rules that must be applied as well. 503 504 The information given in this section is provided to make the rules, 505 tables, and protocol easier to understand. The normative generating 506 rules that correspond to this informal discussion appear in the 507 Tables document, and the rules that actually determine what labels 508 can be registered or looked up are in the Protocol document. 509 510 3.1. A Tiered Model of Permitted Characters and Labels 511 512 Moving to an inclusion model involves a new specification for the 513 list of characters that are permitted in IDNs. In IDNA2003, 514 character validity is independent of context and fixed forever (or 515 until the standard is replaced). However, globally context- 516 independent rules have proved to be impractical because some 517 characters, especially those that are called "Join_Controls" in 518 Unicode, are needed to make reasonable use of some scripts but have 519 no visible effect in others. IDNA2003 prohibited those types of 520 characters entirely by discarding them. We now have a consensus that 521 under some conditions, these "joiner" characters are legitimately 522 needed to allow useful mnemonics for some languages and scripts. In 523 general, context-dependent rules help deal with characters (generally 524 characters that would otherwise be prohibited entirely) that are used 525 differently or perceived differently across different scripts, and 526 allow the standard to be applied more appropriately in cases where a 527 string is not universally handled the same way. 528 529 IDNA2008 divides all possible Unicode code points into four 530 categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED, and 531 UNASSIGNED. 532 533 3.1.1. PROTOCOL-VALID 534 535 Characters identified as PROTOCOL-VALID (often abbreviated PVALID) 536 are permitted in IDNs. Their use may be restricted by rules about 537 the context in which they appear or by other rules that apply to the 538 entire label in which they are to be embedded. For example, any 539 label that contains a character in this category that has a 540 "right-to-left" property must be used in context with the Bidi rules 541 [RFC5893]. The term PROTOCOL-VALID is used to stress the fact that 542 the presence of a character in this category does not imply that a 543 given registry need accept registrations containing any of the 544 545 546 547 Klensin Informational [Page 10] 548 RFC 5894 IDNA Rationale August 2010 549 550 551 characters in the category. Registries are still expected to apply 552 judgment about labels they will accept and to maintain rules 553 consistent with those judgments (see the Protocol document [RFC5891] 554 and Section 3.3). 555 556 Characters that are placed in the PROTOCOL-VALID category are 557 expected to never be removed from it or reclassified. While 558 theoretically characters could be removed from Unicode, such removal 559 would be inconsistent with the Unicode stability principles (see 560 UTR 39: Unicode Security Mechanisms [Unicode52], Appendix F) and 561 hence should never occur. 562 563 3.1.2. CONTEXTUAL RULE REQUIRED 564 565 Some characters may be unsuitable for general use in IDNs but 566 necessary for the plausible support of some scripts. The two most 567 commonly cited examples are the ZERO WIDTH JOINER and ZERO WIDTH 568 NON-JOINER characters (ZWJ, U+200D and ZWNJ, U+200C), but other 569 characters may require special treatment because they would otherwise 570 be DISALLOWED (typically because Unicode considers them punctuation 571 or special symbols) but need to be permitted in limited contexts. 572 Other characters are given this special treatment because they pose 573 exceptional danger of being used to produce misleading labels or to 574 cause unacceptable ambiguity in label matching and interpretation. 575 576 22.214.171.124. Contextual Restrictions 577 578 Characters with contextual restrictions are identified as CONTEXTUAL 579 RULE REQUIRED and are associated with a rule. The rule defines 580 whether the character is valid in a particular string, and also 581 whether the rule itself is to be applied on lookup as well as 582 registration. 583 584 A distinction is made between characters that indicate or prohibit 585 joining and ones similar to them (known as CONTEXT-JOINER or 586 CONTEXTJ) and other characters requiring contextual treatment 587 (CONTEXT-OTHER or CONTEXTO). Only the former require full testing at 588 lookup time. 589 590 It is important to note that these contextual rules cannot prevent 591 all uses of the relevant characters that might be confusing or 592 problematic. What they are expected to do is to confine 593 applicability of the characters to scripts (and narrower contexts) 594 where zone administrators are knowledgeable enough about the use of 595 those characters to be prepared to deal with them appropriately. 596 597 598 599 600 601 602 Klensin Informational [Page 11] 603 RFC 5894 IDNA Rationale August 2010 604 605 606 For example, a registry dealing with an Indic script that requires 607 ZWJ and/or ZWNJ as part of the writing system is expected to 608 understand where the characters have visible effect and where they do 609 not and to make registration rules accordingly. By contrast, a 610 registry dealing primarily with Latin or Cyrillic script might not be 611 actively aware that the characters exist, much less about the 612 consequences of embedding them in labels drawn from those scripts and 613 therefore should avoid accepting registrations containing those 614 characters, at least in labels using characters from the Latin or 615 Cyrillic scripts. 616 617 126.96.36.199. Rules and Their Application 618 619 Rules have descriptions such as "Must follow a character from Script 620 XYZ", "Must occur only if the entire label is in Script ABC", or 621 "Must occur only if the previous and subsequent characters have the 622 DFG property". The actual rules may be DEFINED or NULL. If present, 623 they may have values of "True" (character may be used in any position 624 in any label), "False" (character may not be used in any label), or 625 may be a set of procedural rules that specify the context in which 626 the character is permitted. 627 628 Because it is easier to identify these characters than to know that 629 they are actually needed in IDNs or how to establish exactly the 630 right rules for each one, a rule may have a null value in a given 631 version of the tables. Characters associated with null rules are not 632 permitted to appear in putative labels for either registration or 633 lookup. Of course, a later version of the tables might contain a 634 non-null rule. 635 636 The actual rules and their descriptions are in Sections 2 and 3 of 637 the Tables document [RFC5892]. That document also specifies the 638 creation of a registry for future rules. 639 640 3.1.3. DISALLOWED 641 642 Some characters are inappropriate for use in IDNs and are thus 643 excluded for both registration and lookup (i.e., IDNA-conforming 644 applications performing name lookup should verify that these 645 characters are absent; if they are present, the label strings should 646 be rejected rather than converted to A-labels and looked up. Some of 647 these characters are problematic for use in IDNs (such as the 648 FRACTION SLASH character, U+2044), while some of them (such as the 649 various HEART symbols, e.g., U+2665, U+2661, and U+2765, see 650 Section 7.6) simply fall outside the conventions for typical 651 identifiers (basically letters and numbers). 652 653 654 655 656 657 Klensin Informational [Page 12] 658 RFC 5894 IDNA Rationale August 2010 659 660 661 Of course, this category would include code points that had been 662 removed entirely from Unicode should such removals ever occur. 663 664 Characters that are placed in the DISALLOWED category are expected to 665 never be removed from it or reclassified. If a character is 666 classified as DISALLOWED in error and the error is sufficiently 667 problematic, the only recourse would be either to introduce a new 668 code point into Unicode and classify it as PROTOCOL-VALID or for the 669 IETF to accept the considerable costs of an incompatible change and 670 replace the relevant RFC with one containing appropriate exceptions. 671 672 There is provision for exception cases but, in general, characters 673 are placed into DISALLOWED if they fall into one or more of the 674 following groups: 675 676 o The character is a compatibility equivalent for another character. 677 In slightly more precise Unicode terms, application of 678 Normalization Form KC (NFKC) to the character yields some other 679 character. 680 681 o The character is an uppercase form or some other form that is 682 mapped to another character by Unicode case folding. 683 684 o The character is a symbol or punctuation form or, more generally, 685 something that is not a letter, digit, or a mark that is used to 686 form a letter or digit. 687 688 3.1.4. UNASSIGNED 689 690 For convenience in processing and table-building, code points that do 691 not have assigned values in a given version of Unicode are treated as 692 belonging to a special UNASSIGNED category. Such code points are 693 prohibited in labels to be registered or looked up. The category 694 differs from DISALLOWED in that code points are moved out of it by 695 the simple expedient of being assigned in a later version of Unicode 696 (at which point, they are classified into one of the other categories 697 as appropriate). 698 699 The rationale for restricting the processing of UNASSIGNED characters 700 is simply that the properties of such code points cannot be 701 completely known until actual characters are assigned to them. For 702 example, assume that an UNASSIGNED code point were included in a 703 label to be looked up. Assume that the code point was later assigned 704 to a character that required some set of contextual rules. With that 705 combination, un-updated instances of IDNA-aware software might permit 706 lookup of labels containing the previously unassigned characters 707 while updated versions of the software might restrict use of the same 708 709 710 711 712 Klensin Informational [Page 13] 713 RFC 5894 IDNA Rationale August 2010 714 715 716 label in lookup, depending on the contextual rules. It should be 717 clear that under no circumstance should an UNASSIGNED character be 718 permitted in a label to be registered as part of a domain name. 719 720 3.2. Registration Policy 721 722 While these recommendations cannot and should not define registry 723 policies, registries should develop and apply additional restrictions 724 as needed to reduce confusion and other problems. For example, it is 725 generally believed that labels containing characters from more than 726 one script are a bad practice although there may be some important 727 exceptions to that principle. Some registries may choose to restrict 728 registrations to characters drawn from a very small number of 729 scripts. For many scripts, the use of variant techniques such as 730 those as described in the JET specification for the CJK script 731 [RFC3743] and its generalization [RFC4290], and illustrated for 732 Chinese by the tables provided by the Chinese Domain Name Consortium 733 [RFC4713] may be helpful in reducing problems that might be perceived 734 by users. 735 736 In general, users will benefit if registries only permit characters 737 from scripts that are well-understood by the registry or its 738 advisers. If a registry decides to reduce opportunities for 739 confusion by constructing policies that disallow characters used in 740 historic writing systems or characters whose use is restricted to 741 specialized, highly technical contexts, some relevant information may 742 be found in Section 2.4 (Specific Character Adjustments) of Unicode 743 Identifier and Pattern Syntax [Unicode-UAX31], especially Table 4 744 (Candidate Characters for Exclusion from Identifiers), and Section 745 3.1 (General Security Profile for Identifiers) in Unicode Security 746 Mechanisms [Unicode-UTS39]. 747 748 The requirement (in Section 4.1 of the Protocol document [RFC5891]) 749 that registration procedures use only U-labels and/or A-labels is 750 intended to ensure that registrants are fully aware of exactly what 751 is being registered as well as encouraging use of those canonical 752 forms. That provision should not be interpreted as requiring that 753 registrants need to provide characters in a particular code sequence. 754 Registrant input conventions and management are part of registrant- 755 registrar interactions and relationships between registries and 756 registrars and are outside the scope of these standards. 757 758 It is worth stressing that these principles of policy development and 759 application apply at all levels of the DNS, not only, e.g., top level 760 domain (TLD) or second level domain (SLD) registrations. Even a 761 trivial, "anything is permitted that is valid under the protocol" 762 policy is helpful in that it helps users and application developers 763 know what to expect. 764 765 766 767 Klensin Informational [Page 14] 768 RFC 5894 IDNA Rationale August 2010 769 770 771 3.3. Layered Restrictions: Tables, Context, Registration, and 772 Applications 773 774 The character rules in IDNA2008 are based on the realization that 775 there is no single magic bullet for any of the security, 776 confusability, or other issues associated with IDNs. Instead, the 777 specifications define a variety of approaches. The character tables 778 are the first mechanism, protocol rules about how those characters 779 are applied or restricted in context are the second, and those two in 780 combination constitute the limits of what can be done in the 781 protocol. As discussed in the previous section (Section 3.2), 782 registries are expected to restrict what they permit to be 783 registered, devising and using rules that are designed to optimize 784 the balance between confusion and risk on the one hand and maximum 785 expressiveness in mnemonics on the other. 786 787 In addition, there is an important role for user interface programs 788 in warning against label forms that appear problematic given their 789 knowledge of local contexts and conventions. Of course, no approach 790 based on naming or identifiers alone can protect against all threats. 791 792 4. Application-Related Issues 793 794 4.1. Display and Network Order 795 796 Domain names are always transmitted in network order (the order in 797 which the code points are sent in protocols), but they may have a 798 different display order (the order in which the code points are 799 displayed on a screen or paper). When a domain name contains 800 characters that are normally written right to left, display order may 801 be affected although network order is not. It gets even more 802 complicated if left-to-right and right-to-left labels are adjacent to 803 each other within a domain name. The decision about the display 804 order is ultimately under the control of user agents -- including Web 805 browsers, mail clients, hosted Web applications and many more -- 806 which may be highly localized. Should a domain name abc.def, in 807 which both labels are represented in scripts that are written right 808 to left, be displayed as fed.cba or cba.fed? Applications that are 809 in deployment today are already diverse, and one can find examples of 810 either choice. 811 812 The picture changes once again when an IDN appears in an 813 Internationalized Resource Identifier (IRI) [RFC3987]. An IRI or 814 internationalized email address contains elements other than the 815 domain name. For example, IRIs contain protocol identifiers and 816 field delimiter syntax such as "http://" or "mailto:" while email 817 addresses contain the "@" to separate local parts from domain names. 818 819 820 821 822 Klensin Informational [Page 15] 823 RFC 5894 IDNA Rationale August 2010 824 825 826 An IRI in network order begins with "http://" followed by domain 827 labels in network order, thus "http://abc.def". 828 829 User interface programs are not required to display and allow input 830 of IRIs directly but often do so. Implementers have to choose 831 whether the overall direction of these strings will always be left to 832 right (or right to left) for an IRI or email address. The natural 833 order for a user typing a domain name on a right-to-left system is 834 fed.cba. Should the right-to-left (RTL) user interface reverse the 835 entire domain name each time a domain name is typed? Does this 836 change if the user types "http://" right before typing a domain name, 837 thus implying that the user is beginning at the beginning of the 838 network-order IRI? Experience in the 1980s and 1990s with mixing 839 systems in which domain name labels were read in network order (left 840 to right) and those in which those labels were read right to left 841 would predict a great deal of confusion. 842 843 If each implementation of each application makes its own decisions on 844 these issues, users will develop heuristics that will sometimes fail 845 when switching applications. However, while some display order 846 conventions, voluntarily adopted, would be desirable to reduce 847 confusion, such suggestions are beyond the scope of these 848 specifications. 849 850 4.2. Entry and Display in Applications 851 852 Applications can accept and display domain names using any character 853 set or character coding system. The IDNA protocol does not 854 necessarily affect the interface between users and applications. An 855 IDNA-aware application can accept and display internationalized 856 domain names in two formats: as the internationalized character 857 set(s) supported by the application (i.e., an appropriate local 858 representation of a U-label) and as an A-label. Applications may 859 allow the display of A-labels, but are encouraged not to do so except 860 as an interface for special purposes, possibly for debugging, or to 861 cope with display limitations. In general, they should allow, but 862 not encourage, user input of A-labels. A-labels are opaque and ugly, 863 and malicious variations on them are not easily detected by users. 864 Where possible, they should thus only be exposed when they are 865 absolutely needed. Because IDN labels can be rendered either as 866 A-labels or U-labels, the application may reasonably have an option 867 for the user to select the preferred method of display. Rendering 868 the U-label should normally be the default. 869 870 Domain names are often stored and transported in many places. For 871 example, they are part of documents such as mail messages and web 872 pages. They are transported in many parts of many protocols, such as 873 both the control commands of SMTP and associated message body parts, 874 875 876 877 Klensin Informational [Page 16] 878 RFC 5894 IDNA Rationale August 2010 879 880 881 and in the headers and the body content in HTTP. It is important to 882 remember that domain names appear both in domain name slots and in 883 the content that is passed over protocols, and it would be helpful if 884 protocols explicitly define what their domain name slots are. 885 886 In protocols and document formats that define how to handle 887 specification or negotiation of charsets, labels can be encoded in 888 any charset allowed by the protocol or document format. If a 889 protocol or document format only allows one charset, the labels must 890 be given in that charset. Of course, not all charsets can properly 891 represent all labels. If a U-label cannot be displayed in its 892 entirety, the only choice (without loss of information) may be to 893 display the A-label. 894 895 Where a protocol or document format allows IDNs, labels should be in 896 whatever character encoding and escape mechanism the protocol or 897 document format uses in the local environment. This provision is 898 intended to prevent situations in which, e.g., UTF-8 domain names 899 appear embedded in text that is otherwise in some other character 900 coding. 901 902 All protocols that use domain name slots (see Section 188.8.131.52 in the 903 Definitions document [RFC5890]) already have the capacity for 904 handling domain names in the ASCII charset. Thus, A-labels can 905 inherently be handled by those protocols. 906 907 IDNA2008 does not specify required mappings between one character or 908 code point and others. An extended discussion of mapping issues 909 appears in Section 6 and specific recommendations appear in the 910 Mapping document [IDNA2008-Mapping]. In general, IDNA2008 prohibits 911 characters that would be mapped to others by normalization or other 912 rules. As examples, while mathematical characters based on Latin 913 ones are accepted as input to IDNA2003, they are prohibited in 914 IDNA2008. Similarly, uppercase characters, double-width characters, 915 and other variations are prohibited as IDNA input although mapping 916 them as needed in user interfaces is strongly encouraged. 917 918 Since the rules in the Tables document [RFC5892] have the effect that 919 only strings that are not transformed by NFKC are valid, if an 920 application chooses to perform NFKC normalization before lookup, that 921 operation is safe since this will never make the application unable 922 to look up any valid string. However, as discussed above, the 923 application cannot guarantee that any other application will perform 924 that mapping, so it should be used only with caution and for informed 925 users. 926 927 928 929 930 931 932 Klensin Informational [Page 17] 933 RFC 5894 IDNA Rationale August 2010 934 935 936 In many cases, these prohibitions should have no effect on what the 937 user can type as input to the lookup process. It is perfectly 938 reasonable for systems that support user interfaces to perform some 939 character mapping that is appropriate to the local environment. This 940 would normally be done prior to actual invocation of IDNA. At least 941 conceptually, the mapping would be part of the Unicode conversions 942 discussed above and in the Protocol document [RFC5891]. However, 943 those changes will be local ones only -- local to environments in 944 which users will clearly understand that the character forms are 945 equivalent. For use in interchanges among systems, it appears to be 946 much more important that U-labels and A-labels can be mapped back and 947 forth without loss of information. 948 949 One specific, and very important, instance of this strategy arises 950 with case folding. In the ASCII-only DNS, names are looked up and 951 matched in a case-independent way, but no actual case folding occurs. 952 Names can be placed in the DNS in either uppercase or lowercase form 953 (or any mixture of them) and that form is preserved, returned in 954 queries, and so on. IDNA2003 approximated that behavior for 955 non-ASCII strings by performing case folding at registration time 956 (resulting in only lowercase IDNs in the DNS) and when names were 957 looked up. 958 959 As suggested earlier in this section, it appears to be desirable to 960 do as little character mapping as possible as long as Unicode works 961 correctly (e.g., Normalization Form C (NFC) mapping to resolve 962 different codings for the same character is still necessary although 963 the specifications require that it be performed prior to invoking the 964 protocol) in order to make the mapping between A-labels and U-labels 965 idempotent. Case mapping is not an exception to this principle. If 966 only lowercase characters can be registered in the DNS (i.e., be 967 present in a U-label), then IDNA2008 should prohibit uppercase 968 characters as input even though user interfaces to applications 969 should probably map those characters. Some other considerations 970 reinforce this conclusion. For example, in ASCII case mapping for 971 individual characters, uppercase(character) is always equal to 972 uppercase(lowercase(character)). That may not be true with IDNs. In 973 some scripts that use case distinctions, there are a few characters 974 that do not have counterparts in one case or the other. The 975 relationship between uppercase and lowercase may even be language- 976 dependent, with different languages (or even the same language in 977 different areas) expecting different mappings. User interface 978 programs can meet the expectations of users who are accustomed to the 979 case-insensitive DNS environment by performing case folding prior to 980 IDNA processing, but the IDNA procedures themselves should neither 981 require such mapping nor expect them when they are not natural to the 982 localized environment. 983 984 985 986 987 Klensin Informational [Page 18] 988 RFC 5894 IDNA Rationale August 2010 989 990 991 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate 992 Character Forms 993 994 Users have expectations about character matching or equivalence that 995 are based on their own languages and the orthography of those 996 languages. These expectations may not always be met in a global 997 system, especially if multiple languages are written using the same 998 script but using different conventions. Some examples: 999 1000 o A Norwegian user might expect a label with the ae-ligature to be 1001 treated as the same label as one using the Swedish spelling with 1002 a-diaeresis even though applying that mapping to English would be 1003 astonishing to users. 1004 1005 o A German user might expect a label with an o-umlaut and a label 1006 that had "oe" substituted, but was otherwise the same, to be 1007 treated as equivalent even though that substitution would be a 1008 clear error in Swedish. 1009 1010 o A Chinese user might expect automatic matching of Simplified and 1011 Traditional Chinese characters, but applying that matching for 1012 Korean or Japanese text would create considerable confusion. 1013 1014 o An English user might expect "theater" and "theatre" to match. 1015 1016 A number of languages use alphabetic scripts in which single phonemes 1017 are written using two characters, termed a "digraph", for example, 1018 the "ph" in "pharmacy" and "telephone". (Such characters can also 1019 appear consecutively without forming a digraph, as in "tophat".) 1020 Certain digraphs may be indicated typographically by setting the two 1021 characters closer together than they would be if used consecutively 1022 to represent different phonemes. Some digraphs are fully joined as 1023 ligatures. For example, the word "encyclopaedia" is sometimes set 1024 with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph 1025 forms have the same interpretation across all languages that use a 1026 given script, application of Unicode normalization generally resolves 1027 the differences and causes them to match. When they have different 1028 interpretations, matching must utilize other methods, presumably 1029 chosen at the registry level, or users must be educated to understand 1030 that matching will not occur. 1031 1032 The nature of the problem can be illustrated by many words in the 1033 Norwegian language, where the "ae" ligature is the 27th letter of a 1034 29-letter extended Latin alphabet. It is equivalent to the 28th 1035 letter of the Swedish alphabet (also containing 29 letters), 1036 U+00E4 LATIN SMALL LETTER A WITH DIAERESIS, for which an "ae" cannot 1037 be substituted according to current orthographic standards. That 1038 character (U+00E4) is also part of the German alphabet where, unlike 1039 1040 1041 1042 Klensin Informational [Page 19] 1043 RFC 5894 IDNA Rationale August 2010 1044 1045 1046 in the Nordic languages, the two-character sequence "ae" is usually 1047 treated as a fully acceptable alternate orthography for the "umlauted 1048 a" character. The inverse is however not true, and those two 1049 characters cannot necessarily be combined into an "umlauted a". This 1050 also applies to another German character, the "umlauted o" 1051 (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, 1052 cannot be used for writing the name of the author "Goethe". It is 1053 also a letter in the Swedish alphabet where, like the "a with 1054 diaeresis", it cannot be correctly represented as "oe" and in the 1055 Norwegian alphabet, where it is represented, not as "o with 1056 diaeresis", but as "slashed o", U+00F8. 1057 1058 Some of the ligatures that have explicit code points in Unicode were 1059 given special handling in IDNA2003 and now pose additional problems 1060 in transition. See Section 7.2. 1061 1062 Additional cases with alphabets written right to left are described 1063 in Section 4.5. 1064 1065 Matching and comparison algorithm selection often requires 1066 information about the language being used, context, or both -- 1067 information that is not available to IDNA or the DNS. Consequently, 1068 IDNA2008 makes no attempt to treat combined characters in any special 1069 way. A registry that is aware of the language context in which 1070 labels are to be registered, and where that language sometimes (or 1071 always) treats the two-character sequences as equivalent to the 1072 combined form, should give serious consideration to applying a 1073 "variant" model [RFC3743][RFC4290] or to prohibiting registration of 1074 one of the forms entirely, to reduce the opportunities for user 1075 confusion and fraud that would result from the related strings being 1076 registered to different parties. 1077 1078 4.4. Case Mapping and Related Issues 1079 1080 In the DNS, ASCII letters are stored with their case preserved. 1081 Matching during the query process is case-independent, but none of 1082 the information that might be represented by choices of case has been 1083 lost. That model has been accidentally helpful because, as people 1084 have created DNS labels by catenating words (or parts of words) to 1085 form labels, case has often been used to distinguish among components 1086 and make the labels more memorable. 1087 1088 Since DNS servers do not get involved in parsing IDNs, they cannot do 1089 case-independent matching. Thus, keeping the cases separate in 1090 lookup or registration, and doing matching at the server, is not 1091 feasible with IDNA or any similar approach. Matching of characters 1092 that are considered to differ only by case must be done, if desired, 1093 by programs invoking IDNA lookup even though it wasn't done by ASCII- 1094 1095 1096 1097 Klensin Informational [Page 20] 1098 RFC 5894 IDNA Rationale August 2010 1099 1100 1101 only DNS clients. That situation was recognized in IDNA2003 and 1102 nothing in IDNA2008 fundamentally changes it or could do so. In 1103 IDNA2003, all characters are case folded and mapped by clients in a 1104 standardized step. 1105 1106 Even in scripts that generally support case distinctions, some 1107 characters do not have uppercase forms. For example, the Unicode 1108 case-folding operation maps Greek Final Form Sigma (U+03C2) to the 1109 medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) to 1110 "ss". Neither of these mappings is reversible because the uppercase 1111 of U+03C3 is the uppercase Sigma (U+03A3) and "ss" is an ASCII 1112 string. IDNA2008 permits, at the risk of some incompatibility, 1113 slightly more flexibility in this area by avoiding case folding and 1114 treating these characters as themselves. Approaches to handling one- 1115 way mappings are discussed in Section 7.2. 1116 1117 Because IDNA2003 maps Final Sigma and Eszett to other characters, and 1118 the reverse mapping is never possible, neither Final Sigma nor Eszett 1119 can be represented in the ACE form of IDNA2003 IDN nor in the native 1120 character (U-label) form derived from it. With IDNA2008, both 1121 characters can be used in an IDN and so the A-label used for lookup 1122 for any U-label containing those characters is now different. See 1123 Section 7.1 for a discussion of what kinds of changes might require 1124 the IDNA prefix to change; after extended discussions, the IDNABIS 1125 Working Group came to consensus that the change for these characters 1126 did not justify a prefix change. 1127 1128 4.5. Right-to-Left Text 1129 1130 In order to be sure that the directionality of right-to-left text is 1131 unambiguous, IDNA2003 required that any label in which right-to-left 1132 characters appear both starts and ends with them and that it does not 1133 include any characters with strong left-to-right properties (that 1134 excludes other alphabetic characters but permits European digits). 1135 Any other string that contains a right-to-left character and does not 1136 meet those requirements is rejected. This is one of the few places 1137 where the IDNA algorithms (both in IDNA2003 and in IDNA2008) examine 1138 an entire label, not just individual characters. The algorithmic 1139 model used in IDNA2003 rejects the label when the final character in 1140 a right-to-left string requires a combining mark in order to be 1141 correctly represented. 1142 1143 That prohibition is not acceptable for writing systems for languages 1144 written with consonantal alphabets to which diacritical vocalic 1145 systems are applied, and for languages with orthographies derived 1146 from them where the combining marks may have different functionality. 1147 In both cases, the combining marks can be essential components of the 1148 orthography. Examples of this are Yiddish, written with an extended 1149 1150 1151 1152 Klensin Informational [Page 21] 1153 RFC 5894 IDNA Rationale August 2010 1154 1155 1156 Hebrew script, and Dhivehi (the official language of Maldives), which 1157 is written in the Thaana script (which is, in turn, derived from the 1158 Arabic script). IDNA2008 removes the restriction on final combining 1159 characters with a new set of rules for right-to-left scripts and 1160 their characters. Those new rules are specified in the Bidi document 1161 [RFC5893]. 1162 1163 5. IDNs and the Robustness Principle 1164 1165 The "Robustness Principle" is often stated as "Be conservative about 1166 what you send and liberal in what you accept" (see, e.g., Section 1167 1.2.2 of the applications-layer Host Requirements specification 1168 [RFC1123]). This principle applies to IDNA. In applying the 1169 principle to registries as the source ("sender") of all registered 1170 and useful IDNs, registries are responsible for being conservative 1171 about what they register and put out in the Internet. For IDNs to 1172 work well, zone administrators (registries) must have and require 1173 sensible policies about what is registered -- conservative policies 1174 -- and implement and enforce them. 1175 1176 Conversely, lookup applications are expected to reject labels that 1177 clearly violate global (protocol) rules (no one has ever seriously 1178 claimed that being liberal in what is accepted requires being 1179 stupid). However, once one gets past such global rules and deals 1180 with anything sensitive to script or locale, it is necessary to 1181 assume that garbage has not been placed into the DNS, i.e., one must 1182 be liberal about what one is willing to look up in the DNS rather 1183 than guessing about whether it should have been permitted to be 1184 registered. 1185 1186 If a string cannot be successfully found in the DNS after the lookup 1187 processing described here, it makes no difference whether it simply 1188 wasn't registered or was prohibited by some rule at the registry. 1189 Application implementers should be aware that where DNS wildcards are 1190 used, the ability to successfully resolve a name does not guarantee 1191 that it was actually registered. 1192 1193 6. Front-end and User Interface Processing for Lookup 1194 1195 Domain names may be identified and processed in many contexts. They 1196 may be typed in by users themselves or embedded in an identifier such 1197 as an email address, URI, or IRI. They may occur in running text or 1198 be processed by one system after being provided in another. Systems 1199 may try to normalize URLs to determine (or guess) whether a reference 1200 is valid or if two references point to the same object without 1201 actually looking the objects up (comparison without lookup is 1202 necessary for URI types that are not intended to be resolved). Some 1203 of these goals may be more easily and reliably satisfied than others. 1204 1205 1206 1207 Klensin Informational [Page 22] 1208 RFC 5894 IDNA Rationale August 2010 1209 1210 1211 While there are strong arguments for any domain name that is placed 1212 "on the wire" -- transmitted between systems -- to be in the zero- 1213 ambiguity forms of A-labels, it is inevitable that programs that 1214 process domain names will encounter U-labels or variant forms. 1215 1216 An application that implements the IDNA protocol [RFC5891] will 1217 always take any user input and convert it to a set of Unicode code 1218 points. That user input may be acquired by any of several different 1219 input methods, all with differing conversion processes to be taken 1220 into consideration (e.g., typed on a keyboard, written by hand onto 1221 some sort of digitizer, spoken into a microphone and interpreted by a 1222 speech-to-text engine, etc.). The process of taking any particular 1223 user input and mapping it into a Unicode code point may be a simple 1224 one: if a user strikes the "A" key on a US English keyboard, without 1225 any modifiers such as the "Shift" key held down, in order to draw a 1226 Latin small letter A ("a"), many (perhaps most) modern operating 1227 system input methods will produce to the calling application the code 1228 point U+0061, encoded in a single octet. 1229 1230 Sometimes the process is somewhat more complicated: a user might 1231 strike a particular set of keys to represent a combining macron 1232 followed by striking the "A" key in order to draw a Latin small 1233 letter A with a macron above it. Depending on the operating system, 1234 the input method chosen by the user, and even the parameters with 1235 which the application communicates with the input method, the result 1236 might be the code point U+0101 (encoded as two octets in UTF-8 or 1237 UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed 1238 by the code point U+0304 (again, encoded in three or more octets, 1239 depending upon the encoding used) or even the code point U+FF41 1240 followed by the code point U+0304 (and encoded in some form). These 1241 examples leave aside the issue of operating systems and input methods 1242 that do not use Unicode code points for their character set. 1243 1244 In every case, applications (with the help of the operating systems 1245 on which they run and the input methods used) need to perform a 1246 mapping from user input into Unicode code points. 1247 1248 IDNA2003 used a model whereby input was taken from the user, mapped 1249 (via whatever input method mechanisms were used) to a set of Unicode 1250 code points, and then further mapped to a set of Unicode code points 1251 using the Nameprep profile [RFC3491]. In this procedure, there are 1252 two separate mapping steps: first, a mapping done by the input method 1253 (which might be controlled by the operating system, the application, 1254 or some combination) and then a second mapping performed by the 1255 Nameprep portion of the IDNA protocol. The mapping done in Nameprep 1256 includes a particular mapping table to re-map some characters to 1257 other characters, a particular normalization, and a set of prohibited 1258 characters. 1259 1260 1261 1262 Klensin Informational [Page 23] 1263 RFC 5894 IDNA Rationale August 2010 1264 1265 1266 Note that the result of the two-step mapping process means that the 1267 mapping chosen by the operating system or application in the first 1268 step might differ significantly from the mapping supplied by the 1269 Nameprep profile in the second step. This has advantages and 1270 disadvantages. Of course, the second mapping regularizes what gets 1271 looked up in the DNS, making for better interoperability between 1272 implementations that use the Nameprep mapping. However, the 1273 application or operating system may choose mappings in their input 1274 methods, which when passed through the second (Nameprep) mapping 1275 result in characters that are "surprising" to the end user. 1276 1277 The other important feature of IDNA2003 is that, with very few 1278 exceptions, it assumes that any set of Unicode code points provided 1279 to the Nameprep mapping can be mapped into a string of Unicode code 1280 points that are "sensible", even if that means mapping some code 1281 points to nothing (that is, removing the code points from the 1282 string). This allowed maximum flexibility in input strings. 1283 1284 The present version of IDNA (IDNA2008) differs significantly in 1285 approach from the original version. First and foremost, it does not 1286 provide explicit mapping instructions. Instead, it assumes that the 1287 application (perhaps via an operating system input method) will do 1288 whatever mapping it requires to convert input into Unicode code 1289 points. This has the advantage of giving flexibility to the 1290 application to choose a mapping that is suitable for its user given 1291 specific user requirements, and avoids the two-step mapping of the 1292 original protocol. Instead of a mapping, IDNA2008 provides a set of 1293 categories that can be used to specify the valid code points allowed 1294 in a domain name. 1295 1296 In principle, an application ought to take user input of a domain 1297 name and convert it to the set of Unicode code points that represent 1298 the domain name the user intends. As a practical matter, of course, 1299 determining user intent is a tricky business, so an application needs 1300 to choose a reasonable mapping from user input. That may differ 1301 based on the particular circumstances of a user, depending on locale, 1302 language, type of input method, etc. It is up to the application to 1303 make a reasonable choice. 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 Klensin Informational [Page 24] 1318 RFC 5894 IDNA Rationale August 2010 1319 1320 1321 7. Migration from IDNA2003 and Unicode Version Synchronization 1322 1323 7.1. Design Criteria 1324 1325 As mentioned above and in the IAB review and recommendations for IDNs 1326 [RFC4690], two key goals of the IDNA2008 design are: 1327 1328 o to enable applications to be agnostic about whether they are being 1329 run in environments supporting any Unicode version from 3.2 1330 onward. 1331 1332 o to permit incrementally adding new characters, character groups, 1333 scripts, and other character collections as they are incorporated 1334 into Unicode, doing so without disruption and, in the long term, 1335 without "heavy" processes (an IETF consensus process is required 1336 by the IDNA2008 specifications and is expected to be required and 1337 used until significant experience accumulates with IDNA operations 1338 and new versions of Unicode). 1339 1340 7.1.1. Summary and Discussion of IDNA Validity Criteria 1341 1342 The general criteria for a label to be considered valid under IDNA 1343 are (the actual rules are rigorously defined in the Protocol 1344 [RFC5891] and Tables [RFC5892] documents): 1345 1346 o The characters are "letters", marks needed to form letters, 1347 numerals, or other code points used to write words in some 1348 language. Symbols, drawing characters, and various notational 1349 characters are intended to be permanently excluded. There is no 1350 evidence that they are important enough to Internet operations or 1351 internationalization to justify expansion of domain names beyond 1352 the general principle of "letters, digits, and hyphen". 1353 (Additional discussion and rationale for the symbol decision 1354 appears in Section 7.6.) 1355 1356 o Other than in very exceptional cases, e.g., where they are needed 1357 to write substantially any word of a given language, punctuation 1358 characters are excluded. The fact that a word exists is not proof 1359 that it should be usable in a DNS label, and DNS labels are not 1360 expected to be usable for multiple-word phrases (although they are 1361 certainly not prohibited if the conventions and orthography of a 1362 particular language cause that to be possible). 1363 1364 o Characters that are unassigned (have no character assignment at 1365 all) in the version of Unicode being used by the registry or 1366 application are not permitted, even on lookup. The issues 1367 involved in this decision are discussed in Section 7.7. 1368 1369 1370 1371 1372 Klensin Informational [Page 25] 1373 RFC 5894 IDNA Rationale August 2010 1374 1375 1376 o Any character that is mapped to another character by a current 1377 version of NFKC is prohibited as input to IDNA (for either 1378 registration or lookup). With a few exceptions, this principle 1379 excludes any character mapped to another by Nameprep [RFC3491]. 1380 1381 The principles above drive the design of rules that are specified 1382 exactly in the Tables document. Those rules identify the characters 1383 that are valid under IDNA. The rules themselves are normative, and 1384 the tables are derived from them, rather than vice versa. 1385 1386 7.1.2. Labels in Registration 1387 1388 Any label registered in a DNS zone must be validated -- i.e., the 1389 criteria for that label must be met -- in order for applications to 1390 work as intended. This principle is not new. For example, since the 1391 DNS was first deployed, zone administrators have been expected to 1392 verify that names meet "hostname" requirements [RFC0952] where those 1393 requirements are imposed by the expected applications. Other 1394 applications contexts, such as the later addition of special service 1395 location formats [RFC2782] imposed new requirements on zone 1396 administrators. For zones that will contain IDNs, support for 1397 Unicode version-independence requires restrictions on all strings 1398 placed in the zone. In particular, for such zones (the exact rules 1399 appear in Section 4 of the Protocol document [RFC5891]): 1400 1401 o Any label that appears to be an A-label, i.e., any label that 1402 starts in "xn--", must be valid under IDNA, i.e., they must be 1403 valid A-labels, as discussed in Section 2 above. 1404 1405 o The Unicode tables (i.e., tables of code points, character 1406 classes, and properties) and IDNA tables (i.e., tables of 1407 contextual rules such as those that appear in the Tables 1408 document), must be consistent on the systems performing or 1409 validating labels to be registered. Note that this does not 1410 require that tables reflect the latest version of Unicode, only 1411 that all tables used on a given system are consistent with each 1412 other. 1413 1414 Under this model, registry tables will need to be updated (both the 1415 Unicode-associated tables and the tables of permitted IDN characters) 1416 to enable a new script or other set of new characters. The registry 1417 will not be affected by newer versions of Unicode, or newly 1418 authorized characters, until and unless it wishes to support them. 1419 The zone administrator is responsible for verifying validity for IDNA 1420 as well as its local policies -- a more extensive set of checks than 1421 are required for looking up the labels. Systems looking up or 1422 1423 1424 1425 1426 1427 Klensin Informational [Page 26] 1428 RFC 5894 IDNA Rationale August 2010 1429 1430 1431 resolving DNS labels, especially IDN DNS labels, must be able to 1432 assume that applicable registration rules were followed for names 1433 entered into the DNS. 1434 1435 7.1.3. Labels in Lookup 1436 1437 Any application processing a label through IDNA so it can be looked 1438 up in a DNS zone is required to (the exact rules appear in Section 5 1439 of the Protocol document [RFC5891]): 1440 1441 o Maintain IDNA and Unicode tables that are consistent with regard 1442 to versions, i.e., unless the application actually executes the 1443 classification rules in the Tables document [RFC5892], its IDNA 1444 tables must be derived from the version of Unicode that is 1445 supported more generally on the system. As with registration, the 1446 tables need not reflect the latest version of Unicode, but they 1447 must be consistent. 1448 1449 o Validate the characters in labels to be looked up only to the 1450 extent of determining that the U-label does not contain 1451 "DISALLOWED" code points or code points that are unassigned in its 1452 version of Unicode. 1453 1454 o Validate the label itself for conformance with a small number of 1455 whole-label rules. In particular, it must verify that: 1456 1457 * there are no leading combining marks, 1458 1459 * the Bidi conditions are met if right-to-left characters appear, 1460 1461 * any required contextual rules are available, and 1462 1463 * any contextual rules that are associated with joiner characters 1464 (and CONTEXTJ characters more generally) are tested. 1465 1466 o Do not reject labels based on other contextual rules about 1467 characters, including mixed-script label prohibitions. Such rules 1468 may be used to influence presentation decisions in the user 1469 interface, but not to avoid looking up domain names. 1470 1471 To further clarify the rules about handling characters that require 1472 contextual rules, note that one can have a context-required character 1473 (i.e., one that requires a rule), but no rule. In that case, the 1474 character is treated the same way DISALLOWED characters are treated, 1475 until and unless a rule is supplied. That state is more or less 1476 equivalent to "the idea of permitting this character is accepted in 1477 principle, but it won't be permitted in practice until consensus is 1478 reached on a safe way to use it". 1479 1480 1481 1482 Klensin Informational [Page 27] 1483 RFC 5894 IDNA Rationale August 2010 1484 1485 1486 The ability to add a rule more or less exempts these characters from 1487 the prohibition against reclassifying characters from DISALLOWED to 1488 PVALID. 1489 1490 And, obviously, "no rule" is different from "have a rule, but the 1491 test either succeeds or fails". 1492 1493 Lookup applications that follow these rules, rather than having their 1494 own criteria for rejecting lookup attempts, are not sensitive to 1495 version incompatibilities with the particular zone registry 1496 associated with the domain name except for labels containing 1497 characters recently added to Unicode. 1498 1499 An application or client that processes names according to this 1500 protocol and then resolves them in the DNS will be able to locate any 1501 name that is registered, as long as those registrations are valid 1502 under IDNA and its version of the IDNA tables is sufficiently up to 1503 date to interpret all of the characters in the label. Messages to 1504 users should distinguish between "label contains an unallocated code 1505 point" and other types of lookup failures. A failure on the basis of 1506 an old version of Unicode may lead the user to a desire to upgrade to 1507 a newer version, but will have no other ill effects (this is 1508 consistent with behavior in the transition to the DNS when some hosts 1509 could not yet handle some forms of names or record types). 1510 1511 7.2. Changes in Character Interpretations 1512 1513 As a consequence of the elimination of mapping, the current version 1514 of IDNA changes the interpretation of a few characters relative to 1515 its predecessors. This subsection outlines the issues and discusses 1516 possible transition strategies. 1517 1518 7.2.1. Character Changes: Eszett and Final Sigma 1519 1520 In those scripts that make case distinctions, there are a few 1521 characters for which an obvious and unique uppercase character has 1522 not historically been available to match a lowercase one, or vice 1523 versa. For those characters, the mappings used in constructing the 1524 Stringprep tables for IDNA2003, performed using the Unicode 1525 toCaseFold operation (see Section 5.18 of the Unicode Standard 1526 [Unicode52]), generate different characters or sets of characters. 1527 Those operations are not reversible and lose even more information 1528 than traditional uppercase or lowercase transformations, but are more 1529 useful than those transformations for comparison purposes. Two 1530 notable characters of this type are the German character Eszett 1531 (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The 1532 former is case folded to the ASCII string "ss", the latter to a 1533 medial (lowercase) Sigma (U+03C3). 1534 1535 1536 1537 Klensin Informational [Page 28] 1538 RFC 5894 IDNA Rationale August 2010 1539 1540 1541 7.2.2. Character Changes: Zero Width Joiner and Zero Width Non-Joiner 1542 1543 IDNA2003 mapped both ZERO WIDTH JOINER (ZWJ, U+200D) and ZERO WIDTH 1544 NON-JOINER (ZWNJ, U+200C) to nothing, effectively dropping these 1545 characters from any label in which they appeared and treating strings 1546 containing them as identical to strings that did not. As discussed 1547 in Section 3.1.2 above, those characters are essential for writing 1548 many reasonable mnemonics for certain scripts. However, treating 1549 them as valid in IDNA2008, even with contextual restrictions, raises 1550 approximately the same problem as exists with Eszett and Final Sigma: 1551 strings that were valid under IDNA2003 have different interpretations 1552 as labels, and different A-labels, than the same strings under this 1553 newer version. 1554 1555 7.2.3. Character Changes and the Need for Transition 1556 1557 The decision to eliminate mandatory and standardized mappings, 1558 including case folding, from the IDNA2008 protocol in order to make 1559 A-labels and U-labels idempotent made these characters problematic. 1560 If they were to be disallowed, important words and mnemonics could 1561 not be written in orthographically reasonable ways. If they were to 1562 be permitted as distinct characters, there would be no information 1563 loss and registries would have more flexibility, but IDNA2003 and 1564 IDNA2008 lookups might result in different A-labels. 1565 1566 With the understanding that there would be incompatibility either way 1567 but a judgment that the incompatibility was not significant enough to 1568 justify a prefix change, the Working Group concluded that Eszett and 1569 Final Form Sigma should be treated as distinct and Protocol-Valid 1570 characters. 1571 1572 Since these characters are interpreted in different ways under the 1573 older and newer versions of IDNA, transition strategies and policies 1574 will be necessary. Some actions can reasonably be taken by 1575 applications' client programs (those that perform lookup operations 1576 or cause them to be performed), but because of the diversity of 1577 situations and uses of the DNS, much of the responsibility will need 1578 to fall on registries. 1579 1580 Registries, especially those maintaining zones for third parties, 1581 must decide how to introduce a new service in a way that does not 1582 create confusion or significantly weaken or invalidate existing 1583 identifiers. This is not a new problem; registries were faced with 1584 similar issues when IDNs were introduced (potentially, and especially 1585 for Latin-based scripts, in conflict with existing labels that had 1586 been rendered in ASCII characters by applying more or less 1587 standardized conventions) and when other new forms of strings have 1588 been permitted as labels. 1589 1590 1591 1592 Klensin Informational [Page 29] 1593 RFC 5894 IDNA Rationale August 2010 1594 1595 1596 7.2.4. Transition Strategies 1597 1598 There are several approaches to the introduction of new characters or 1599 changes in interpretation of existing characters from their mapped 1600 forms in the earlier version of IDNA. The transition issue is 1601 complicated because the forms of these labels after the 1602 ToUnicode(ToASCII()) translation in IDNA2003 not only remain valid 1603 but do not provide strong indications of what the registrant 1604 intended: a string containing "ss" could have simply been intended to 1605 be that string or could have been intended to contain an Eszett; a 1606 string containing lowercase Sigma could have been intended to contain 1607 Final Sigma (one might make heuristic guesses based on position in a 1608 string, but the long tradition of forming labels by concatenating 1609 words makes such heuristics unreliable), and strings that do not 1610 contain ZWJ or ZWNJ might have been intended to contain them. 1611 Without any preference or claim to completeness, some of these, all 1612 of which have been used by registries in the past for similar 1613 transitions, are: 1614 1615 1. Do not permit use of the newly available character at the 1616 registry level. This might cause lookup failures if a domain 1617 name were to be written with the expectation of the IDNA2003 1618 mapping behavior, but would eliminate any possibility of false 1619 matches. 1620 1621 2. Hold a "sunrise"-like arrangement in which holders of labels 1622 containing "ss" in the Eszett case, lowercase Sigma in that case, 1623 or that might have contained ZWJ or ZWNJ in context, are given 1624 priority (and perhaps other benefits) for registering the 1625 corresponding string containing Eszett, Final Sigma, or the 1626 appropriate zero-width character respectively. 1627 1628 3. Adopt some sort of "variant" approach in which registrants obtain 1629 labels with both character forms. 1630 1631 4. Adopt a different form of "variant" approach in which 1632 registration of additional strings that would produce the same 1633 A-label if interpreted according to IDNA2003 is either not 1634 permitted at all or permitted only by the registrant who already 1635 has one of the names. 1636 1637 5. Ignore the issue and assume that the marketplace or other 1638 mechanisms will sort things out. 1639 1640 In any event, a registry (at any level of the DNS tree) that chooses 1641 to permit labels to be registered that contains these characters, or 1642 considers doing so, will have to address the relationship with 1643 existing, possibly conflicting, labels in some way, just as 1644 1645 1646 1647 Klensin Informational [Page 30] 1648 RFC 5894 IDNA Rationale August 2010 1649 1650 1651 registries that already had a considerable number of labels did when 1652 IDNs were first introduced. 1653 1654 7.3. Elimination of Character Mapping 1655 1656 As discussed at length in Section 6, IDNA2003, via Nameprep (see 1657 Section 7.5), mapped many characters into related ones. Those 1658 mappings no longer exist as requirements in IDNA2008. These 1659 specifications strongly prefer that only A-labels or U-labels be used 1660 in protocol contexts and as much as practical more generally. 1661 IDNA2008 does anticipate situations in which some mapping at the time 1662 of user input into lookup applications is appropriate and desirable. 1663 The issues are discussed in Section 6 and specific recommendations 1664 are made in the Mapping document [IDNA2008-Mapping]. 1665 1666 7.4. The Question of Prefix Changes 1667 1668 The conditions that would have required a change in the IDNA ACE 1669 prefix ("xn--", used in IDNA2003) were of great concern to the 1670 community. A prefix change would have clearly been necessary if the 1671 algorithms were modified in a manner that would have created serious 1672 ambiguities during subsequent transition in registrations. This 1673 section summarizes the working group's conclusions about the 1674 conditions under which a change in the prefix would have been 1675 necessary and the implications of such a change. 1676 1677 7.4.1. Conditions Requiring a Prefix Change 1678 1679 An IDN prefix change would have been needed if a given string would 1680 be looked up or otherwise interpreted differently depending on the 1681 version of the protocol or tables being used. This IDNA upgrade 1682 would have required a prefix change if, and only if, one of the 1683 following four conditions were met: 1684 1685 1. The conversion of an A-label to Unicode (i.e., a U-label) would 1686 have yielded one string under IDNA2003 and a different string 1687 under IDNA2008. 1688 1689 2. In a significant number of cases, an input string that was valid 1690 under IDNA2003 and also valid under IDNA2008 would have yielded 1691 two different A-labels with the different versions. This 1692 condition is believed to be essentially equivalent to the one 1693 above except for a very small number of edge cases that were not 1694 found to justify a prefix change (see Section 7.2). 1695 1696 Note that if the input string was valid under one version and not 1697 valid under the other, this condition would not apply. See the 1698 first item in Section 7.4.2, below. 1699 1700 1701 1702 Klensin Informational [Page 31] 1703 RFC 5894 IDNA Rationale August 2010 1704 1705 1706 3. A fundamental change was made to the semantics of the string that 1707 would be inserted in the DNS, e.g., if a decision were made to 1708 try to include language or script information in the encoding in 1709 addition to the string itself. 1710 1711 4. A sufficiently large number of characters were added to Unicode 1712 so that the Punycode mechanism for block offsets would no longer 1713 reference the higher-numbered planes and blocks. This condition 1714 is unlikely even in the long term and certain not to arise in the 1715 next several years. 1716 1717 7.4.2. Conditions Not Requiring a Prefix Change 1718 1719 As a result of the principles described above, none of the following 1720 changes required a new prefix: 1721 1722 1. Prohibition of some characters as input to IDNA. Such a 1723 prohibition might make names that were previously registered 1724 inaccessible, but did not change those names. 1725 1726 2. Adjustments in IDNA tables or actions, including normalization 1727 definitions, that affected characters that were already invalid 1728 under IDNA2003. 1729 1730 3. Changes in the style of the IDNA definition that did not alter 1731 the actions performed by IDNA. 1732 1733 7.4.3. Implications of Prefix Changes 1734 1735 While it might have been possible to make a prefix change, the costs 1736 of such a change are considerable. Registries could not have 1737 converted all IDNA2003 ("xn--") registrations to a new form at the 1738 same time and synchronize that change with applications supporting 1739 lookup. Unless all existing registrations were simply to be declared 1740 invalid (and perhaps even then), systems that needed to support both 1741 labels with old prefixes and labels with new ones would be required 1742 to first process a putative label under the IDNA2008 rules and try to 1743 look it up and then, if it were not found, would be required to 1744 process the label under IDNA2003 rules and look it up again. That 1745 process would probably have significantly slowed down all processing 1746 that involved IDNs in the DNS, especially since a fully-qualified 1747 name might contain a mixture of labels that were registered with the 1748 old and new prefixes. That would have made DNS caching very 1749 difficult. In addition, looking up the same input string as two 1750 separate A-labels would have created some potential for confusion and 1751 attacks, since the labels could map to different targets and then 1752 resolve to different entries in the DNS. 1753 1754 1755 1756 1757 Klensin Informational [Page 32] 1758 RFC 5894 IDNA Rationale August 2010 1759 1760 1761 Consequently, a prefix change should have been, and was, avoided if 1762 at all possible, even if it means accepting some IDNA2003 decisions 1763 about character distinctions as irreversible and/or giving special 1764 treatment to edge cases. 1765 1766 7.5. Stringprep Changes and Compatibility 1767 1768 The Nameprep specification [RFC3491], a key part of IDNA2003, is a 1769 profile of Stringprep [RFC3454]. While Nameprep is a Stringprep 1770 profile specific to IDNA, Stringprep is used by a number of other 1771 protocols. Were Stringprep to have been modified by IDNA2008, those 1772 changes to improve the handling of IDNs could cause problems for 1773 non-DNS uses, most notably if they affected identification and 1774 authentication protocols. Several elements of IDNA2008 give 1775 interpretations to strings prohibited under IDNA2003 or prohibit 1776 strings that IDNA2003 permitted. Those elements include the new 1777 inclusion information in the Tables document [RFC5892], the reduction 1778 in the number of characters permitted as input for registration or 1779 lookup (Section 3), and even the changes in handling of right-to-left 1780 strings as described in the Bidi document [RFC5893]. IDNA2008 does 1781 not use Nameprep or Stringprep at all, so there are no side-effect 1782 changes to other protocols. 1783 1784 It is particularly important to keep IDNA processing separate from 1785 processing for various security protocols because some of the 1786 constraints that are necessary for smooth and comprehensible use of 1787 IDNs may be unwanted or undesirable in other contexts. For example, 1788 the criteria for good passwords or passphrases are very different 1789 from those for desirable IDNs: passwords should be hard to guess, 1790 while domain names should normally be easily memorable. Similarly, 1791 internationalized Small Computer System Interface (SCSI) identifiers 1792 and other protocol components are likely to have different 1793 requirements than IDNs. 1794 1795 7.6. The Symbol Question 1796 1797 One of the major differences between this specification and the 1798 original version of IDNA is that IDNA2003 permitted non-letter 1799 symbols of various sorts, including punctuation and line-drawing 1800 symbols, in the protocol. They were always discouraged in practice. 1801 In particular, both the "IESG Statement" about IDNA and all versions 1802 of the ICANN Guidelines specify that only language characters be used 1803 in labels. This specification disallows symbols entirely. There are 1804 several reasons for this, which include: 1805 1806 1. As discussed elsewhere, the original IDNA specification assumed 1807 that as many Unicode characters as possible should be permitted, 1808 directly or via mapping to other characters, in IDNs. This 1809 1810 1811 1812 Klensin Informational [Page 33] 1813 RFC 5894 IDNA Rationale August 2010 1814 1815 1816 specification operates on an inclusion model, extrapolating from 1817 the original "hostname" rules (LDH, see the Definitions document 1818 [RFC5890]) -- which have served the Internet very well -- to a 1819 Unicode base rather than an ASCII base. 1820 1821 2. Symbol names are more problematic than letters because there may 1822 be no general agreement on whether a particular glyph matches a 1823 symbol; there are no uniform conventions for naming; variations 1824 such as outline, solid, and shaded forms may or may not exist; 1825 and so on. As just one example, consider a "heart" symbol as it 1826 might appear in a logo that might be read as "I love...". While 1827 the user might read such a logo as "I love..." or "I heart...", 1828 considerable knowledge of the coding distinctions made in Unicode 1829 is needed to know that there is more than one "heart" character 1830 (e.g., U+2665, U+2661, and U+2765) and how to describe it. These 1831 issues are of particular importance if strings are expected to be 1832 understood or transcribed by the listener after being read out 1833 loud. 1834 1835 3. Design of a screen reader used by blind Internet users who must 1836 listen to renderings of IDN domain names and possibly reproduce 1837 them on the keyboard becomes considerably more complicated when 1838 the names of characters are not obvious and intuitive to anyone 1839 familiar with the language in question. 1840 1841 4. As a simplified example of this, assume one wanted to use a 1842 "heart" or "star" symbol in a label. This is problematic because 1843 those names are ambiguous in the Unicode system of naming (the 1844 actual Unicode names require far more qualification). A user or 1845 would-be registrant has no way to know -- absent careful study of 1846 the code tables -- whether it is ambiguous (e.g., where there are 1847 multiple "heart" characters) or not. Conversely, the user seeing 1848 the hypothetical label doesn't know whether to read it -- try to 1849 transmit it to a colleague by voice -- as "heart", as "love", as 1850 "black heart", or as any of the other examples below. 1851 1852 5. The actual situation is even worse than this. There is no 1853 possible way for a normal, casual, user to tell the difference 1854 between the hearts of U+2665 and U+2765 and the stars of U+2606 1855 and U+2729 without somehow knowing to look for a distinction. We 1856 have a white heart (U+2661) and few black hearts. Consequently, 1857 describing a label as containing a heart is hopelessly ambiguous: 1858 we can only know that it contains one of several characters that 1859 look like hearts or have "heart" in their names. In cities where 1860 "Square" is a popular part of a location name, one might well 1861 want to use a square symbol in a label as well and there are far 1862 more squares of various flavors in Unicode than there are hearts 1863 or stars. 1864 1865 1866 1867 Klensin Informational [Page 34] 1868 RFC 5894 IDNA Rationale August 2010 1869 1870 1871 The consequence of these ambiguities is that symbols are a very poor 1872 basis for reliable communication. Consistent with this conclusion, 1873 the Unicode standard recommends that strings used in identifiers not 1874 contain symbols or punctuation [Unicode-UAX31]. Of course, these 1875 difficulties with symbols do not arise with actual pictographic 1876 languages and scripts which would be treated like any other language 1877 characters; the two should not be confused. 1878 1879 7.7. Migration between Unicode Versions: Unassigned Code Points 1880 1881 In IDNA2003, labels containing unassigned code points are looked up 1882 on the assumption that, if they appear in labels and can be mapped 1883 and then resolved, the relevant standards must have changed and the 1884 registry has properly allocated only assigned values. 1885 1886 In the IDNA2008 protocol, strings containing unassigned code points 1887 must not be either looked up or registered. In summary, the status 1888 of an unassigned character with regard to the DISALLOWED, 1889 PROTOCOL-VALID, and CONTEXTUAL RULE REQUIRED categories cannot be 1890 evaluated until a character is actually assigned and known. There 1891 are several reasons for this, with the most important ones being: 1892 1893 o Tests involving the context of characters (e.g., some characters 1894 being permitted only adjacent to others of specific types) and 1895 integrity tests on complete labels are needed. Unassigned code 1896 points cannot be permitted because one cannot determine whether 1897 particular code points will require contextual rules (and what 1898 those rules should be) before characters are assigned to them and 1899 the properties of those characters fully understood. 1900 1901 o It cannot be known in advance, and with sufficient reliability, 1902 whether a newly assigned code point will be associated with a 1903 character that would be disallowed by the rules in the Tables 1904 document [RFC5892] (such as a compatibility character). In 1905 IDNA2003, since there is no direct dependency on NFKC (many of the 1906 entries in Stringprep's tables are based on NFKC, but IDNA2003 1907 depends only on Stringprep), allocation of a compatibility 1908 character might produce some odd situations, but it would not be a 1909 problem. In IDNA2008, where compatibility characters are 1910 DISALLOWED unless character-specific exceptions are made, 1911 permitting strings containing unassigned characters to be looked 1912 up would violate the principle that characters in DISALLOWED are 1913 not looked up. 1914 1915 o The Unicode Standard specifies that an unassigned code point 1916 normalizes (and, where relevant, case folds) to itself. If the 1917 code point is later assigned to a character, and particularly if 1918 the newly assigned code point has a combining class that 1919 1920 1921 1922 Klensin Informational [Page 35] 1923 RFC 5894 IDNA Rationale August 2010 1924 1925 1926 determines its placement relative to other combining characters, 1927 it could normalize to some other code point or sequence. 1928 1929 It is possible to argue that the issues above are not important and 1930 that, as a consequence, it is better to retain the principle of 1931 looking up labels even if they contain unassigned characters because 1932 all of the important scripts and characters have been coded as of 1933 Unicode 5.2 (or even earlier), and hence unassigned code points will 1934 be assigned only to obscure characters or archaic scripts. 1935 Unfortunately, that does not appear to be a safe assumption for at 1936 least two reasons. First, much the same claim of completeness has 1937 been made for earlier versions of Unicode. The reality is that a 1938 script that is obscure to much of the world may still be very 1939 important to those who use it. Cultural and linguistic preservation 1940 principles make it inappropriate to declare the script of no 1941 importance in IDNs. Second, we already have counterexamples, e.g., 1942 in the relationships associated with new Han characters being added 1943 (whether in the BMP or in Unicode Plane 2). 1944 1945 Independent of the technical transition issues identified above, it 1946 can be observed that any addition of characters to an existing script 1947 to make it easier to use or to better accommodate particular 1948 languages may lead to transition issues. Such additions may change 1949 the preferred form for writing a particular string, changes that may 1950 be reflected, e.g., in keyboard transition modules that would 1951 necessarily be different from those for earlier versions of Unicode 1952 where the newer characters may not exist. This creates an inherent 1953 transition problem because attempts to access labels may use either 1954 the old or the new conventions, requiring registry action whether or 1955 not the older conventions were used in labels. The need to consider 1956 transition mechanisms is inherent to evolution of Unicode to better 1957 accommodate writing systems and is independent of how IDNs are 1958 represented in the DNS or how transitions among versions of those 1959 mechanisms occur. The requirement for transitions of this type is 1960 illustrated by the addition of Malayalam Chillu in Unicode 5.1.0. 1961 1962 7.8. Other Compatibility Issues 1963 1964 The 2003 IDNA model includes several odd artifacts of the context in 1965 which it was developed. Many, if not all, of these are potential 1966 avenues for exploits, especially if the registration process permits 1967 "source" names (names that have not been processed through IDNA and 1968 Nameprep) to be registered. As one example, since the character 1969 Eszett, used in German, is mapped by IDNA2003 into the sequence "ss" 1970 rather than being retained as itself or prohibited, a string 1971 containing that character, but that is otherwise in ASCII, is not 1972 really an IDN (in the U-label sense defined above). After Nameprep 1973 maps out the Eszett, the result is an ASCII string and so it does not 1974 1975 1976 1977 Klensin Informational [Page 36] 1978 RFC 5894 IDNA Rationale August 2010 1979 1980 1981 get an xn-- prefix, but the string that can be displayed to a user 1982 appears to be an IDN. IDNA2008 eliminates this artifact. A 1983 character is either permitted as itself or it is prohibited; special 1984 cases that make sense only in a particular linguistic or cultural 1985 context can be dealt with as localization matters where appropriate. 1986 1987 8. Name Server Considerations 1988 1989 8.1. Processing Non-ASCII Strings 1990 1991 Existing DNS servers do not know the IDNA rules for handling 1992 non-ASCII forms of IDNs, and therefore need to be shielded from them. 1993 All existing channels through which names can enter a DNS server 1994 database (for example, master files (as described in RFC 1034) and 1995 DNS update messages [RFC2136]) could not be IDNA-aware because they 1996 predate IDNA. Other sections of this document provide the needed 1997 shielding by ensuring that internationalized domain names entering 1998 DNS server databases through such channels have already been 1999 converted to their equivalent ASCII A-label forms. 2000 2001 Because of the distinction made between the algorithms for 2002 Registration and Lookup in Sections 4 and 5 (respectively) of the 2003 Protocol document [RFC5891] (a domain name containing only ASCII code 2004 points cannot be converted to an A-label), there cannot be more than 2005 one A-label form for any given U-label. 2006 2007 As specified in clarifications to the DNS specification [RFC2181], 2008 the DNS protocol explicitly allows domain labels to contain octets 2009 beyond the ASCII range (0000..007F), and this document does not 2010 change that. However, although the interpretation of octets 2011 0080..00FF is well-defined in the DNS, many application protocols 2012 support only ASCII labels and there is no defined interpretation of 2013 these non-ASCII octets as characters and, in particular, no 2014 interpretation of case-independent matching for them (e.g., see the 2015 clarification on DNS case insensitivity [RFC4343]). If labels 2016 containing these octets are returned to applications, unpredictable 2017 behavior could result. The A-label form, which cannot contain those 2018 characters, is the only standard representation for internationalized 2019 labels in the DNS protocol. 2020 2021 8.2. Root and Other DNS Server Considerations 2022 2023 IDNs in A-label form will generally be somewhat longer than current 2024 domain names, so the bandwidth needed by the root servers is likely 2025 to go up by a small amount. Also, queries and responses for IDNs 2026 will probably be somewhat longer than typical queries historically, 2027 2028 2029 2030 2031 2032 Klensin Informational [Page 37] 2033 RFC 5894 IDNA Rationale August 2010 2034 2035 2036 so Extension Mechanisms for DNS (EDNS0) [RFC2671] support may be more 2037 important (otherwise, queries and responses may be forced to go to 2038 TCP instead of UDP). 2039 2040 9. Internationalization Considerations 2041 2042 DNS labels and fully-qualified domain names provide mnemonics that 2043 assist in identifying and referring to resources on the Internet. 2044 IDNs expand the range of those mnemonics to include those based on 2045 languages and character sets other than Western European and Roman- 2046 derived ones. But domain "names" are not, in general, words in any 2047 language. The recommendations of the IETF policy on character sets 2048 and languages (BCP 18 [RFC2277]) are applicable to situations in 2049 which language identification is used to provide language-specific 2050 contexts. The DNS is, by contrast, global and international and 2051 ultimately has nothing to do with languages. Adding languages (or 2052 similar context) to IDNs generally, or to DNS matching in particular, 2053 would imply context-dependent matching in DNS, which would be a very 2054 significant change to the DNS protocol itself. It would also imply 2055 that users would need to identify the language associated with a 2056 particular label in order to look that label up. That knowledge is 2057 generally not available because many labels are not words in any 2058 language and some may be words in more than one. 2059 2060 10. IANA Considerations 2061 2062 This section gives an overview of IANA registries required for IDNA. 2063 The actual definitions of, and specifications for, the first two, 2064 which have been newly created for IDNA2008, appear in the Tables 2065 document [RFC5892]. This document describes the registries, but it 2066 does not specify any IANA actions. 2067 2068 10.1. IDNA Character Registry 2069 2070 The distinction among the major categories "UNASSIGNED", 2071 "DISALLOWED", "PROTOCOL-VALID", and "CONTEXTUAL RULE REQUIRED" is 2072 made by special categories and rules that are integral elements of 2073 the Tables document. While not normative, an IANA registry of 2074 characters and scripts and their categories, updated for each new 2075 version of Unicode and the characters it contains, are convenient for 2076 programming and validation purposes. The details of this registry 2077 are specified in the Tables document. 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 Klensin Informational [Page 38] 2088 RFC 5894 IDNA Rationale August 2010 2089 2090 2091 10.2. IDNA Context Registry 2092 2093 IANA has created and now maintains a list of approved contextual 2094 rules for characters that are defined in the IDNA Character Registry 2095 list as requiring a Contextual Rule (i.e., the types of rules 2096 described in Section 3.1.2). The details for those rules appear in 2097 the Tables document. 2098 2099 10.3. IANA Repository of IDN Practices of TLDs 2100 2101 This registry, historically described as the "IANA Language Character 2102 Set Registry" or "IANA Script Registry" (both somewhat misleading 2103 terms), is maintained by IANA at the request of ICANN. It is used to 2104 provide a central documentation repository of the IDN policies used 2105 by top level domain (TLD) registries who volunteer to contribute to 2106 it and is used in conjunction with ICANN Guidelines for IDN use. 2107 2108 It is not an IETF-managed registry and, while the protocol changes 2109 specified here may call for some revisions to the tables, IDNA2008 2110 has no direct effect on that registry and no IANA action is required 2111 as a result. 2112 2113 11. Security Considerations 2114 2115 11.1. General Security Issues with IDNA 2116 2117 This document is purely explanatory and informational and 2118 consequently introduces no new security issues. It would, of course, 2119 be a poor idea for someone to try to implement from it; such an 2120 attempt would almost certainly lead to interoperability problems and 2121 might lead to security ones. A discussion of security issues with 2122 IDNA, including some relevant history, appears in the Definitions 2123 document [RFC5890]. 2124 2125 12. Acknowledgments 2126 2127 The editor and contributors would like to express their thanks to 2128 those who contributed significant early (pre-working group) review 2129 comments, sometimes accompanied by text, Paul Hoffman, Simon 2130 Josefsson, and Sam Weiler. In addition, some specific ideas were 2131 incorporated from suggestions, text, or comments about sections that 2132 were unclear supplied by Vint Cerf, Frank Ellerman, Michael Everson, 2133 Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler. 2134 Thanks are also due to Vint Cerf, Lisa Dusseault, Debbie Garside, and 2135 Jefsey Morfin for conversations that led to considerable improvements 2136 in the content of this document and to several others, including Ben 2137 2138 2139 2140 2141 2142 Klensin Informational [Page 39] 2143 RFC 5894 IDNA Rationale August 2010 2144 2145 2146 Campbell, Martin Duerst, Subramanian Moonesamy, Peter Saint-Andre, 2147 and Dan Winship, for catching specific errors and recommending 2148 corrections. 2149 2150 A meeting was held on 30 January 2008 to attempt to reconcile 2151 differences in perspective and terminology about this set of 2152 specifications between the design team and members of the Unicode 2153 Technical Consortium. The discussions at and subsequent to that 2154 meeting were very helpful in focusing the issues and in refining the 2155 specifications. The active participants at that meeting were (in 2156 alphabetic order, as usual) Harald Alvestrand, Vint Cerf, Tina Dam, 2157 Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary 2158 Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, 2159 Michel Suignard, and Ken Whistler. We express our thanks to Google 2160 for support of that meeting and to the participants for their 2161 contributions. 2162 2163 Useful comments and text on the working group versions of the working 2164 draft were received from many participants in the IETF "IDNABIS" 2165 working group and a number of document changes resulted from mailing 2166 list discussions made by that group. Marcos Sanz provided specific 2167 analysis and suggestions that were exceptionally helpful in refining 2168 the text, as did Vint Cerf, Martin Duerst, Andrew Sullivan, and Ken 2169 Whistler. Lisa Dusseault provided extensive editorial suggestions 2170 during the spring of 2009, most of which were incorporated. 2171 2172 13. Contributors 2173 2174 While the listed editor held the pen, the core of this document and 2175 the initial working group version represents the joint work and 2176 conclusions of an ad hoc design team consisting of the editor and, in 2177 alphabetic order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and 2178 Cary Karp. Considerable material describing mapping principles has 2179 been incorporated from a draft of the Mapping document 2180 [IDNA2008-Mapping] by Pete Resnick and Paul Hoffman. In addition, 2181 there were many specific contributions and helpful comments from 2182 those listed in the Acknowledgments section and others who have 2183 contributed to the development and use of the IDNA protocols. 2184 2185 14. References 2186 2187 14.1. Normative References 2188 2189 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 2190 "Internationalizing Domain Names in Applications 2191 (IDNA)", RFC 3490, March 2003. 2192 2193 2194 2195 2196 2197 Klensin Informational [Page 40] 2198 RFC 5894 IDNA Rationale August 2010 2199 2200 2201 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of 2202 Unicode for Internationalized Domain Names in 2203 Applications (IDNA)", RFC 3492, March 2003. 2204 2205 [RFC5890] Klensin, J., "Internationalized Domain Names for 2206 Applications (IDNA): Definitions and Document 2207 Framework", RFC 5890, August 2010. 2208 2209 [RFC5891] Klensin, J., "Internationalized Domain Names in 2210 Applications (IDNA): Protocol", RFC 5891, August 2010. 2211 2212 [RFC5892] Faltstrom, P., "The Unicode Code Points and 2213 Internationalized Domain Names for Applications (IDNA)", 2214 RFC 5892, August 2010. 2215 2216 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for 2217 Internationalized Domain Names for Applications (IDNA)", 2218 RFC 5893, August 2010. 2219 2220 [Unicode52] The Unicode Consortium. The Unicode Standard, Version 2221 5.2.0, defined by: "The Unicode Standard, Version 2222 5.2.0", (Mountain View, CA: The Unicode Consortium, 2223 2009. ISBN 978-1-936213-00-9). 2224 <http://www.unicode.org/versions/Unicode5.2.0/>. 2225 2226 14.2. Informative References 2227 2228 [IDNA2008-Mapping] 2229 Resnick, P. and P. Hoffman, "Mapping Characters in 2230 Internationalized Domain Names for Applications (IDNA)", 2231 Work in Progress, April 2010. 2232 2233 [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD 2234 Internet host table specification", RFC 952, 2235 October 1985. 2236 2237 [RFC1034] Mockapetris, P., "Domain names - concepts and 2238 facilities", STD 13, RFC 1034, November 1987. 2239 2240 [RFC1035] Mockapetris, P., "Domain names - implementation and 2241 specification", STD 13, RFC 1035, November 1987. 2242 2243 [RFC1123] Braden, R., "Requirements for Internet Hosts - 2244 Application and Support", STD 3, RFC 1123, October 1989. 2245 2246 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, 2247 "Dynamic Updates in the Domain Name System (DNS 2248 UPDATE)", RFC 2136, April 1997. 2249 2250 2251 2252 Klensin Informational [Page 41] 2253 RFC 5894 IDNA Rationale August 2010 2254 2255 2256 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 2257 Specification", RFC 2181, July 1997. 2258 2259 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 2260 Languages", BCP 18, RFC 2277, January 1998. 2261 2262 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", 2263 RFC 2671, August 1999. 2264 2265 [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for 2266 specifying the location of services (DNS SRV)", 2267 RFC 2782, February 2000. 2268 2269 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 2270 Internationalized Strings ("stringprep")", RFC 3454, 2271 December 2002. 2272 2273 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 2274 Profile for Internationalized Domain Names (IDN)", 2275 RFC 3491, March 2003. 2276 2277 [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint 2278 Engineering Team (JET) Guidelines for Internationalized 2279 Domain Names (IDN) Registration and Administration for 2280 Chinese, Japanese, and Korean", RFC 3743, April 2004. 2281 2282 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 2283 Identifiers (IRIs)", RFC 3987, January 2005. 2284 2285 [RFC4290] Klensin, J., "Suggested Practices for Registration of 2286 Internationalized Domain Names (IDN)", RFC 4290, 2287 December 2005. 2288 2289 [RFC4343] Eastlake, D., "Domain Name System (DNS) Case 2290 Insensitivity Clarification", RFC 4343, January 2006. 2291 2292 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review 2293 and Recommendations for Internationalized Domain Names 2294 (IDNs)", RFC 4690, September 2006. 2295 2296 [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, 2297 "Registration and Administration Recommendations for 2298 Chinese Domain Names", RFC 4713, October 2006. 2299 2300 2301 2302 2303 2304 2305 2306 2307 Klensin Informational [Page 42] 2308 RFC 5894 IDNA Rationale August 2010 2309 2310 2311 [Unicode-UAX31] 2312 The Unicode Consortium, "Unicode Standard Annex #31: 2313 Unicode Identifier and Pattern Syntax, Revision 11", 2314 September 2009, 2315 <http://www.unicode.org/reports/tr31/tr31-11.html>. 2316 2317 [Unicode-UTS39] 2318 The Unicode Consortium, "Unicode Technical Standard #39: 2319 Unicode Security Mechanisms, Revision 2", August 2006, 2320 <http://www.unicode.org/reports/tr39/tr39-2.html>. 2321 2322 Author's Address 2323 2324 John C Klensin 2325 1770 Massachusetts Ave, Ste 322 2326 Cambridge, MA 02140 2327 USA 2328 2329 Phone: +1 617 245 1457 2330 EMail: email@example.com 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 Klensin Informational [Page 43] 2363
The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.
This RFC is included in the DNS RFCs annotation project whose home page is here.