1 Internet Engineering Task Force (IETF) J. Klensin
2 Request for Comments: 5891 August 2010
3 Obsoletes: 3490, 3491
4 Updates: 3492
5 Category: Standards Track
6 ISSN: 2070-1721
7
8
9 Internationalized Domain Names in Applications (IDNA): Protocol
10
11 Abstract
12
13 This document is the revised protocol definition for
14 Internationalized Domain Names (IDNs). The rationale for changes,
15 the relationship to the older specification, and important
16 terminology are provided in other documents. This document specifies
17 the protocol mechanism, called Internationalized Domain Names in
18 Applications (IDNA), for registering and looking up IDNs in a way
19 that does not require changes to the DNS itself. IDNA is only meant
20 for processing domain names, not free text.
21
22 Status of This Memo
23
24 This is an Internet Standards Track document.
25
26 This document is a product of the Internet Engineering Task Force
27 (IETF). It represents the consensus of the IETF community. It has
28 received public review and has been approved for publication by the
29 Internet Engineering Steering Group (IESG). Further information on
30 Internet Standards is available in Section 2 of RFC 5741.
31
32 Information about the current status of this document, any errata,
33 and how to provide feedback on it may be obtained at
34 http://www.rfc-editor.org/info/rfc5891.
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52 Klensin Standards Track [Page 1]
53 RFC 5891 IDNA2008 Protocol August 2010
54
55
56 Copyright Notice
57
58 Copyright (c) 2010 IETF Trust and the persons identified as the
59 document authors. All rights reserved.
60
61 This document is subject to BCP 78 and the IETF Trust's Legal
62 Provisions Relating to IETF Documents
63 (http://trustee.ietf.org/license-info) in effect on the date of
64 publication of this document. Please review these documents
65 carefully, as they describe your rights and restrictions with respect
66 to this document. Code Components extracted from this document must
67 include Simplified BSD License text as described in Section 4.e of
68 the Trust Legal Provisions and are provided without warranty as
69 described in the Simplified BSD License.
70
71 This document may contain material from IETF Documents or IETF
72 Contributions published or made publicly available before November
73 10, 2008. The person(s) controlling the copyright in some of this
74 material may not have granted the IETF Trust the right to allow
75 modifications of such material outside the IETF Standards Process.
76 Without obtaining an adequate license from the person(s) controlling
77 the copyright in such materials, this document may not be modified
78 outside the IETF Standards Process, and derivative works of it may
79 not be created outside the IETF Standards Process, except to format
80 it for publication as an RFC or to translate it into languages other
81 than English.
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107 Klensin Standards Track [Page 2]
108 RFC 5891 IDNA2008 Protocol August 2010
109
110
111 Table of Contents
112
113 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
114 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
115 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5
116 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
117 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
118 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
119 3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6
120 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
121 4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7
122 4.2. Permitted Character and Label Validation . . . . . . . . . 7
123 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7
124 4.2.2. Rejection of Characters That Are Not Permitted . . . . 8
125 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8
126 4.2.4. Registration Validation Requirements . . . . . . . . . 9
127 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
128 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
129 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10
130 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10
131 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
132 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
133 5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10
134 5.4. Validation and Character List Testing . . . . . . . . . . 11
135 5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
136 5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
137 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
138 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
139 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
140 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14
141 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
142 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
143 10.2. Informative References . . . . . . . . . . . . . . . . . . 15
144 Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162 Klensin Standards Track [Page 3]
163 RFC 5891 IDNA2008 Protocol August 2010
164
165
The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.
This RFC is included in the DNS RFCs annotation project whose home page is here.
This RFC is implemented in BIND 9.18 (all versions).
166 1. Introduction
167
168 This document supplies the protocol definition for Internationalized
169 Domain Names in Applications (IDNA), with the version specified here
170 known as IDNA2008. Essential definitions and terminology for
171 understanding this document and a road map of the collection of
172 documents that make up IDNA2008 appear in a separate Definitions
173 document [RFC5890]. Appendix A discusses the relationship between
174 this specification and the earlier version of IDNA (referred to here
175 as "IDNA2003"). The rationale for these changes, along with
176 considerable explanatory material and advice to zone administrators
177 who support IDNs, is provided in another document, known informally
178 in this series as the "Rationale document" [RFC5894].
179
180 IDNA works by allowing applications to use certain ASCII [ASCII]
181 string labels (beginning with a special prefix) to represent
182 non-ASCII name labels. Lower-layer protocols need not be aware of
183 this; therefore, IDNA does not change any infrastructure. In
184 particular, IDNA does not depend on any changes to DNS servers,
185 resolvers, or DNS protocol elements, because the ASCII name service
186 provided by the existing DNS can be used for IDNA.
187
188 IDNA applies only to a specific subset of DNS labels. The base DNS
189 standards [RFC1034] [RFC1035] and their various updates specify how
190 to combine labels into fully-qualified domain names and parse labels
191 out of those names.
192
193 This document describes two separate protocols, one for IDN
194 registration (Section 4) and one for IDN lookup (Section 5). These
195 two protocols share some terminology, reference data, and operations.
196
197 2. Terminology
198
199 As mentioned above, terminology used as part of the definition of
200 IDNA appears in the Definitions document [RFC5890]. It is worth
201 noting that some of this terminology overlaps with, and is consistent
202 with, that used in Unicode or other character set standards and the
203 DNS. Readers of this document are assumed to be familiar with the
204 associated Definitions document and with the DNS-specific terminology
205 in RFC 1034 [RFC1034].
206
207 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
208 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
209 document are to be interpreted as described in BCP 14, RFC 2119
210 [RFC2119].
211
212
213
214
215
216
217 Klensin Standards Track [Page 4]
218 RFC 5891 IDNA2008 Protocol August 2010
219
220
221 3. Requirements and Applicability
222
223 3.1. Requirements
224
225 IDNA makes the following requirements:
226
227 1. Whenever a domain name is put into a domain name slot that is not
228 IDNA-aware (see Section 2.3.2.6 of the Definitions document
229 [RFC5890]), it MUST contain only ASCII characters (i.e., its
230 labels must be either A-labels or NR-LDH labels), unless the DNS
231 application is not subject to historical recommendations for
232 "hostname"-style names (see RFC 1034 [RFC1034] and
233 Section 3.2.1).
234
235 2. Labels MUST be compared using equivalent forms: either both
236 A-label forms or both U-label forms. Because A-labels and
237 U-labels can be transformed into each other without loss of
238 information, these comparisons are equivalent (however, in
239 practice, comparison of U-labels requires first verifying that
240 they actually are U-labels and not just Unicode strings). A pair
241 of A-labels MUST be compared as case-insensitive ASCII (as with
242 all comparisons of ASCII DNS labels). U-labels MUST be compared
243 as-is, without case folding or other intermediate steps. While
244 it is not necessary to validate labels in order to compare them,
245 successful comparison does not imply validity. In many cases,
246 not limited to comparison, validation may be important for other
247 reasons and SHOULD be performed.
248
249 3. Labels being registered MUST conform to the requirements of
250 Section 4. Labels being looked up and the lookup process MUST
251 conform to the requirements of Section 5.
252
253 3.2. Applicability
254
255 IDNA applies to all domain names in all domain name slots in
256 protocols except where it is explicitly excluded. It does not apply
257 to domain name slots that do not use the LDH syntax rules as
258 described in the Definitions document [RFC5890].
259
260 Because it uses the DNS, IDNA applies to many protocols that were
261 specified before it was designed. IDNs occupying domain name slots
262 in those older protocols MUST be in A-label form until and unless
263 those protocols and their implementations are explicitly upgraded to
264 be aware of IDNs and to accept the U-label form. IDNs actually
265 appearing in DNS queries or responses MUST be A-labels.
266
267
268
269
270
271
272 Klensin Standards Track [Page 5]
273 RFC 5891 IDNA2008 Protocol August 2010
274
275
276 IDNA-aware protocols and implementations MAY accept U-labels,
277 A-labels, or both as those particular protocols specify. IDNA is not
278 defined for extended label types (see RFC 2671 [RFC2671], Section 3).
279
280 3.2.1. DNS Resource Records
281
282 IDNA applies only to domain names in the NAME and RDATA fields of DNS
283 resource records whose CLASS is IN. See the DNS specification
284 [RFC1035] for precise definitions of these terms.
285
286 The application of IDNA to DNS resource records depends entirely on
287 the CLASS of the record, and not on the TYPE except as noted below.
288 This will remain true, even as new TYPEs are defined, unless a new
289 TYPE defines TYPE-specific rules. Special naming conventions for SRV
290 records (and "underscore labels" more generally) are incompatible
291 with IDNA coding as discussed in the Definitions document [RFC5890],
292 especially Section 2.3.2.3. Of course, underscore labels may be part
293 of a domain that uses IDN labels at higher levels in the tree.
294
295 3.2.2. Non-Domain-Name Data Types Stored in the DNS
296
297 Although IDNA enables the representation of non-ASCII characters in
298 domain names, that does not imply that IDNA enables the
299 representation of non-ASCII characters in other data types that are
300 stored in domain names, specifically in the RDATA field for types
301 that have structured RDATA format. For example, an email address
302 local part is stored in a domain name in the RNAME field as part of
303 the RDATA of an SOA record (e.g., hostmaster@example.com would be
304 represented as hostmaster.example.com). IDNA does not update the
305 existing email standards, which allow only ASCII characters in local
306 parts. Even though work is in progress to define
307 internationalization for email addresses [RFC4952], changes to the
308 email address part of the SOA RDATA would require action in, or
309 updates to, other standards, specifically those that specify the
310 format of the SOA RR.
311
312 4. Registration Protocol
313
314 This section defines the model for registering an IDN. The model is
315 implementation independent; any sequence of steps that produces
316 exactly the same result for all labels is considered a valid
317 implementation.
318
319 Note that, while the registration (this section) and lookup protocols
320 (Section 5) are very similar in most respects, they are not
321 identical, and implementers should carefully follow the steps
322 described in this specification.
323
324
325
326
327 Klensin Standards Track [Page 6]
328 RFC 5891 IDNA2008 Protocol August 2010
329
330
331 4.1. Input to IDNA Registration
332
333 Registration processes, especially processing by entities (often
334 called "registrars") who deal with registrants before the request
335 actually reaches the zone manager ("registry") are outside the scope
336 of this definition and may differ significantly depending on local
337 needs. By the time a string enters the IDNA registration process as
338 described in this specification, it MUST be in Unicode and in
339 Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
340 zone files ("registries") MUST accept only the exact string for which
341 registration is requested, free of any mappings or local adjustments.
342 They MAY accept that input in any of three forms:
343
344 1. As a pair of A-label and U-label.
345
346 2. As an A-label only.
347
348 3. As a U-label only.
349
350 The first two of these forms are RECOMMENDED because the use of
351 A-labels avoids any possibility of ambiguity. The first is normally
352 preferred over the second because it permits further verification of
353 user intent (see Section 4.2.1).
354
355 4.2. Permitted Character and Label Validation
356
357 4.2.1. Input Format
358
359 If both the U-label and A-label forms are available, the registry
360 MUST ensure that the A-label form is in lowercase, perform a
361 conversion to a U-label, perform the steps and tests described below
362 on that U-label, and then verify that the A-label produced by the
363 step in Section 4.4 matches the one provided as input. In addition,
364 the U-label that was provided as input and the one obtained by
365 conversion of the A-label MUST match exactly. If, for some reason,
366 these tests fail, the registration MUST be rejected.
367
368 If only an A-label was provided and the conversion to a U-label is
369 not performed, the registry MUST still verify that the A-label is
370 superficially valid, i.e., that it does not violate any of the rules
371 of Punycode encoding [RFC3492] such as the prohibition on trailing
372 hyphen-minus, the requirement that all characters be ASCII, and so
373 on. Strings that appear to be A-labels (e.g., they start with
374 "xn--") and strings that are supplied to the registry in a context
375 reserved for A-labels (such as a field in a form to be filled out),
376 but that are not valid A-labels as described in this paragraph, MUST
377 NOT be placed in DNS zones that support IDNA.
378
379
380
381
382 Klensin Standards Track [Page 7]
383 RFC 5891 IDNA2008 Protocol August 2010
384
385
386 If only an A-label is provided, the conversion to a U-label is not
387 performed, but the superficial tests described in the previous
388 paragraph are performed, registration procedures MAY, and usually
389 will, bypass the tests and actions in the balance of Section 4.2 and
390 in Sections 4.3 and 4.4.
391
392 4.2.2. Rejection of Characters That Are Not Permitted
393
394 The candidate Unicode string MUST NOT contain characters that appear
395 in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables
396 document [RFC5892].
397
398 4.2.3. Label Validation
399
400 The proposed label (in the form of a Unicode string, i.e., a string
401 that at least superficially appears to be a U-label) is then examined
402 using tests that require examination of more than one character.
403 Character order is considered to be the on-the-wire order. That
404 order may not be the same as the display order.
405
406 4.2.3.1. Hyphen Restrictions
407
408 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
409 the third and fourth character positions and MUST NOT start or end
410 with a "-" (hyphen).
411
Section: Appendix B.2 Original Text: AuthorityInfoAccessSyntax, GeneralName, CrlEntryExtensions FROM PKIX1Implicit-2009 -- From [RFC5912] {iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-mod-pkix1-implicit-02(5
AuthorityInfoAccessSyntax, GeneralName, CrlEntryExtensions, CRLReason FROM PKIX1Implicit-2009 -- From [RFC5912] {iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-mod-pkix1-implicit-02(59)}
Said to
412 4.2.3.2. Leading Combining Marks
413
414 The Unicode string MUST NOT begin with a combining mark or combining
415 character (see The Unicode Standard, Section 2.11 [Unicode] for an
416 exact definition).
417
418 4.2.3.3. Contextual Rules
419
420 The Unicode string MUST NOT contain any characters whose validity is
421 context-dependent, unless the validity is positively confirmed by a
422 contextual rule. To check this, each code point identified as
423 CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a
424 non-null rule. If such a code point is missing a rule, the label is
425 invalid. If the rule exists but the result of applying the rule is
426 negative or inconclusive, the proposed label is invalid.
427
428 4.2.3.4. Labels Containing Characters Written Right to Left
429
430 If the proposed label contains any characters from scripts that are
431 written from right to left, it MUST meet the Bidi criteria [RFC5893].
432
433
434
435
436
437 Klensin Standards Track [Page 8]
438 RFC 5891 IDNA2008 Protocol August 2010
439
440
441 4.2.4. Registration Validation Requirements
442
443 Strings that contain at least one non-ASCII character, have been
444 produced by the steps above, whose contents pass all of the tests in
445 Section 4.2.3, and are 63 or fewer characters long in
446 ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels.
447
448 To summarize, tests are made in Section 4.2 for invalid characters,
449 invalid combinations of characters, for labels that are invalid even
450 if the characters they contain are valid individually, and for labels
451 that do not conform to the restrictions for strings containing
452 right-to-left characters.
453
454 4.3. Registry Restrictions
455
456 In addition to the rules and tests above, there are many reasons why
457 a registry could reject a label. Registries at all levels of the
458 DNS, not just the top level, are expected to establish policies about
459 label registrations. Policies are likely to be informed by the local
460 languages and the scripts that are used to write them and may depend
461 on many factors including what characters are in the label (for
462 example, a label may be rejected based on other labels already
463 registered). See the Rationale document [RFC5894], Section 3.2, for
464 further discussion and recommendations about registry policies.
465
466 The string produced by the steps in Section 4.2 is checked and
467 processed as appropriate to local registry restrictions. Application
468 of those registry restrictions may result in the rejection of some
469 labels or the application of special restrictions to others.
470
471 4.4. Punycode Conversion
472
473 The resulting U-label is converted to an A-label (defined in Section
474 2.3.2.1 of the Definitions document [RFC5890]). The A-label is the
475 encoding of the U-label according to the Punycode algorithm [RFC3492]
476 with the ACE prefix "xn--" added at the beginning of the string. The
477 resulting string must, of course, conform to the length limits
478 imposed by the DNS. This document does not update or alter the
479 Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does
480 make a non-normative reference to the information about the value and
481 construction of the ACE prefix that appears in RFC 3490 or Nameprep
482 [RFC3491]. For consistency and reader convenience, IDNA2008
483 effectively updates that reference to point to this document. That
484 change does not alter the prefix itself. The prefix, "xn--", is the
485 same in both sets of documents.
486
487
488
489
490
491
492 Klensin Standards Track [Page 9]
493 RFC 5891 IDNA2008 Protocol August 2010
494
495
496 With the exception of the maximum string length test on Punycode
497 output, the failure conditions identified in the Punycode encoding
498 procedure cannot occur if the input is a U-label as determined by the
499 steps in Sections 4.1 through 4.3 above.
500
501 4.5. Insertion in the Zone
502
503 The label is registered in the DNS by inserting the A-label into a
504 zone.
505
506 5. Domain Name Lookup Protocol
507
508 Lookup is different from registration and different tests are applied
509 on the client. Although some validity checks are necessary to avoid
510 serious problems with the protocol, the lookup-side tests are more
511 permissive and rely on the assumption that names that are present in
512 the DNS are valid. That assumption is, however, a weak one because
513 the presence of wildcards in the DNS might cause a string that is not
514 actually registered in the DNS to be successfully looked up.
515
516 5.1. Label String Input
517
518 The user supplies a string in the local character set, for example,
519 by typing it, clicking on it, or copying and pasting it from a
520 resource identifier, e.g., a Uniform Resource Identifier (URI)
521 [RFC3986] or an Internationalized Resource Identifier (IRI)
522 [RFC3987], from which the domain name is extracted. Alternately,
523 some process not directly involving the user may read the string from
524 a file or obtain it in some other way. Processing in this step and
525 the one specified in Section 5.2 are local matters, to be
526 accomplished prior to actual invocation of IDNA.
527
528 5.2. Conversion to Unicode
529
530 The string is converted from the local character set into Unicode, if
531 it is not already in Unicode. Depending on local needs, this
532 conversion may involve mapping some characters into other characters
533 as well as coding conversions. Those issues are discussed in the
534 mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the
535 Rationale document [RFC5894] and in the separate Mapping document
536 [IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form.
537
538 5.3. A-label Input
539
540 If the input to this procedure appears to be an A-label (i.e., it
541 starts in "xn--", interpreted case-insensitively), the lookup
542 application MAY attempt to convert it to a U-label, first ensuring
543 that the A-label is entirely in lowercase (converting it to lowercase
544
545
546
547 Klensin Standards Track [Page 10]
548 RFC 5891 IDNA2008 Protocol August 2010
549
550
551 if necessary), and apply the tests of Section 5.4 and the conversion
552 of Section 5.5 to that form. If the label is converted to Unicode
553 (i.e., to U-label form) using the Punycode decoding algorithm, then
554 the processing specified in those two sections MUST be performed, and
555 the label MUST be rejected if the resulting label is not identical to
556 the original. See Section 8.1 of the Rationale document [RFC5894]
557 for additional discussion on this topic.
558
559 Conversion from the A-label and testing that the result is a U-label
560 SHOULD be performed if the domain name will later be presented to the
561 user in native character form (this requires that the lookup
562 application be IDNA-aware). If those steps are not performed, the
563 lookup process SHOULD at least test to determine that the string is
564 actually an A-label, examining it for the invalid formats specified
565 in the Punycode decoding specification. Applications that are not
566 IDNA-aware will obviously omit that testing; others MAY treat the
567 string as opaque to avoid the additional processing at the expense of
568 providing less protection and information to users.
569
570 5.4. Validation and Character List Testing
571
572 As with the registration procedure described in Section 4, the
573 Unicode string is checked to verify that all characters that appear
574 in it are valid as input to IDNA lookup processing. As discussed
575 above and in the Rationale document [RFC5894], the lookup check is
576 more liberal than the registration one. Labels that have not been
577 fully evaluated for conformance to the applicable rules are referred
578 to as "putative" labels as discussed in Section 2.3.2.1 of the
579 Definitions document [RFC5890]. Putative U-labels with any of the
580 following characteristics MUST be rejected prior to DNS lookup:
581
582 o Labels that are not in NFC [Unicode-UAX15].
583
584 o Labels containing "--" (two consecutive hyphens) in the third and
585 fourth character positions.
586
587 o Labels whose first character is a combining mark (see The Unicode
588 Standard, Section 2.11 [Unicode]).
589
590 o Labels containing prohibited code points, i.e., those that are
591 assigned to the "DISALLOWED" category of the Tables document
592 [RFC5892].
593
594 o Labels containing code points that are identified in the Tables
595 document as "CONTEXTJ", i.e., requiring exceptional contextual
596 rule processing on lookup, but that do not conform to those rules.
597 Note that this implies that a rule must be defined, not null: a
598
599
600
601
602 Klensin Standards Track [Page 11]
603 RFC 5891 IDNA2008 Protocol August 2010
604
605
606 character that requires a contextual rule but for which the rule
607 is null is treated in this step as having failed to conform to the
608 rule.
609
610 o Labels containing code points that are identified in the Tables
611 document as "CONTEXTO", but for which no such rule appears in the
612 table of rules. Applications resolving DNS names or carrying out
613 equivalent operations are not required to test contextual rules
614 for "CONTEXTO" characters, only to verify that a rule is defined
615 (although they MAY make such tests to provide better protection or
616 give better information to the user).
617
618 o Labels containing code points that are unassigned in the version
619 of Unicode being used by the application, i.e., in the UNASSIGNED
620 category of the Tables document.
621
622 This requirement means that the application must use a list of
623 unassigned characters that is matched to the version of Unicode
624 that is being used for the other requirements in this section. It
625 is not required that the application know which version of Unicode
626 is being used; that information might be part of the operating
627 environment in which the application is running.
628
629 In addition, the application SHOULD apply the following test.
630
631 o Verification that the string is compliant with the requirements
632 for right-to-left characters specified in the Bidi document
633 [RFC5893].
634
635 This test may be omitted in special circumstances, such as when the
636 lookup application knows that the conditions are enforced elsewhere,
637 because an attempt to look up and resolve such strings will almost
638 certainly lead to a DNS lookup failure except when wildcards are
639 present in the zone. However, applying the test is likely to give
640 much better information about the reason for a lookup failure --
641 information that may be usefully passed to the user when that is
642 feasible -- than DNS resolution failure information alone.
643
644 For all other strings, the lookup application MUST rely on the
645 presence or absence of labels in the DNS to determine the validity of
646 those labels and the validity of the characters they contain. If
647 they are registered, they are presumed to be valid; if they are not,
648 their possible validity is not relevant. While a lookup application
649 may reasonably issue warnings about strings it believes may be
650 problematic, applications that decline to process a string that
651 conforms to the rules above (i.e., does not look it up in the DNS)
652 are not in conformance with this protocol.
653
654
655
656
657 Klensin Standards Track [Page 12]
658 RFC 5891 IDNA2008 Protocol August 2010
659
660
661 5.5. Punycode Conversion
662
663 The string that has now been validated for lookup is converted to ACE
664 form by applying the Punycode algorithm to the string and then adding
665 the ACE prefix ("xn--").
666
667 5.6. DNS Name Resolution
668
669 The A-label resulting from the conversion in Section 5.5 or supplied
670 directly (see Section 5.3) is combined with other labels as needed to
671 form a fully-qualified domain name that is then looked up in the DNS,
672 using normal DNS resolver procedures. The lookup can obviously
673 either succeed (returning information) or fail.
674
675 6. Security Considerations
676
677 Security Considerations for this version of IDNA are described in the
678 Definitions document [RFC5890], except for the special issues
679 associated with right-to-left scripts and characters. The latter are
680 discussed in the Bidi document [RFC5893].
681
682 In order to avoid intentional or accidental attacks from labels that
683 might be confused with others, special problems in rendering, and so
684 on, the IDNA model requires that registries exercise care and
685 thoughtfulness about what labels they choose to permit. That issue
686 is discussed in Section 4.3 of this document which, in turn, points
687 to a somewhat more extensive discussion in the Rationale document
688 [RFC5894].
689
690 7. IANA Considerations
691
692 IANA actions for this version of IDNA are specified in the Tables
693 document [RFC5892] and discussed informally in the Rationale document
694 [RFC5894]. The components of IDNA described in this document do not
695 require any IANA actions.
696
697 8. Contributors
698
699 While the listed editor held the pen, the original versions of this
700 document represent the joint work and conclusions of an ad hoc design
701 team consisting of the editor and, in alphabetic order, Harald
702 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
703 draws significantly on the original version of IDNA [RFC3490] both
704 conceptually and for specific text. This second-generation version
705 would not have been possible without the work that went into that
706 first version and especially the contributions of its authors Patrik
707 Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was
708
709
710
711
712 Klensin Standards Track [Page 13]
713 RFC 5891 IDNA2008 Protocol August 2010
714
715
716 actively involved in the creation of this version, Hoffman and
717 Costello were not and should not be held responsible for any errors
718 or omissions.
719
720 9. Acknowledgments
721
722 This revision to IDNA would have been impossible without the
723 accumulated experience since RFC 3490 was published and resulting
724 comments and complaints of many people in the IETF, ICANN, and other
725 communities (too many people to list here). Nor would it have been
726 possible without RFC 3490 itself and the efforts of the Working Group
727 that defined it. Those people whose contributions are acknowledged
728 in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894]
729 were particularly important.
730
731 Specific textual changes were incorporated into this document after
732 suggestions from the other contributors, Stephane Bortzmeyer, Vint
733 Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell,
734 Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken
735 Whistler, Chris Wright, and other WG participants and reviewers
736 including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter
737 Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific
738 errors and recommended corrections. Special thanks are due to Paul
739 Hoffman for permission to extract material to form the basis for
740 Appendix A from a draft document that he prepared.
741
742 10. References
743
744 10.1. Normative References
745
746 [RFC1034] Mockapetris, P., "Domain names - concepts and
747 facilities", STD 13, RFC 1034, November 1987.
748
749 [RFC1035] Mockapetris, P., "Domain names - implementation and
750 specification", STD 13, RFC 1035, November 1987.
751
752 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
753 Requirement Levels", BCP 14, RFC 2119, March 1997.
754
755 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
756 Unicode for Internationalized Domain Names in
757 Applications (IDNA)", RFC 3492, March 2003.
758
759 [RFC5890] Klensin, J., "Internationalized Domain Names for
760 Applications (IDNA): Definitions and Document
761 Framework", RFC 5890, August 2010.
762
763
764
765
766
767 Klensin Standards Track [Page 14]
768 RFC 5891 IDNA2008 Protocol August 2010
769
770
771 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
772 Internationalized Domain Names for Applications (IDNA)",
773 RFC 5892, August 2010.
774
775 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
776 for Internationalized Domain Names for Applications
777 (IDNA)", RFC 5893, August 2010.
778
779 [Unicode-UAX15]
780 The Unicode Consortium, "Unicode Standard Annex #15:
781 Unicode Normalization Forms", September 2009,
782 <http://www.unicode.org/reports/tr15/>.
783
784 10.2. Informative References
785
786 [ASCII] American National Standards Institute (formerly United
787 States of America Standards Institute), "USA Code for
788 Information Interchange", ANSI X3.4-1968, 1968. ANSI
789 X3.4-1968 has been replaced by newer versions with
790 slight modifications, but the 1968 version remains
791 definitive for the Internet.
792
793 [IDNA2008-Mapping]
794 Resnick, P. and P. Hoffman, "Mapping Characters in
795 Internationalized Domain Names for Applications (IDNA)",
796 Work in Progress, April 2010.
797
798 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
799 RFC 2671, August 1999.
800
801 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
802 "Internationalizing Domain Names in Applications
803 (IDNA)", RFC 3490, March 2003.
804
805 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
806 Profile for Internationalized Domain Names (IDN)",
807 RFC 3491, March 2003.
808
809 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
810 Resource Identifier (URI): Generic Syntax", STD 66,
811 RFC 3986, January 2005.
812
813 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
814 Identifiers (IRIs)", RFC 3987, January 2005.
815
816 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
817 and Recommendations for Internationalized Domain Names
818 (IDNs)", RFC 4690, September 2006.
819
820
821
822 Klensin Standards Track [Page 15]
823 RFC 5891 IDNA2008 Protocol August 2010
824
825
826 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
827 Internationalized Email", RFC 4952, July 2007.
828
829 [RFC5894] Klensin, J., "Internationalized Domain Names for
830 Applications (IDNA): Background, Explanation, and
831 Rationale", RFC 5894, August 2010.
832
833 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
834 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN
835 0-321-48091-0. This printed reference has now been
836 updated online to reflect additional code points. For
837 code points, the reference at the time this document was
838 published is to Unicode 5.2.
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877 Klensin Standards Track [Page 16]
878 RFC 5891 IDNA2008 Protocol August 2010
879
880
881 Appendix A. Summary of Major Changes from IDNA2003
882
883 1. Update base character set from Unicode 3.2 to Unicode version
884 agnostic.
885
886 2. Separate the definitions for the "registration" and "lookup"
887 activities.
888
889 3. Disallow symbol and punctuation characters except where special
890 exceptions are necessary.
891
892 4. Remove the mapping and normalization steps from the protocol and
893 have them, instead, done by the applications themselves,
894 possibly in a local fashion, before invoking the protocol.
895
896 5. Change the way that the protocol specifies which characters are
897 allowed in labels from "humans decide what the table of code
898 points contains" to "decision about code points are based on
899 Unicode properties plus a small exclusion list created by
900 humans".
901
902 6. Introduce the new concept of characters that can be used only in
903 specific contexts.
904
905 7. Allow typical words and names in languages such as Dhivehi and
906 Yiddish to be expressed.
907
908 8. Make bidirectional domain names (delimited strings of labels,
909 not just labels standing on their own) display in a less
910 surprising fashion, whether they appear in obvious domain name
911 contexts or as part of running text in paragraphs.
912
913 9. Remove the dot separator from the mandatory part of the
914 protocol.
915
916 10. Make some currently valid labels that are not actually IDNA
917 labels invalid.
918
919 Author's Address
920
921 John C Klensin
922 1770 Massachusetts Ave, Ste 322
923 Cambridge, MA 02140
924 USA
925
926 Phone: +1 617 245 1457
927 EMail: john+ietf@jck.com
928
929
930
931
932 Klensin Standards Track [Page 17]
933
The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definition).
The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definitionas defined in The Unicode Standard, Section 3.6 [Unicode], definition D52).