1 Internet Engineering Task Force (IETF) J. Klensin
2 Request for Comments: 5890 August 2010
3 Obsoletes: 3490
4 Category: Standards Track
5 ISSN: 2070-1721
6
7
8 Internationalized Domain Names for Applications (IDNA):
9 Definitions and Document Framework
10
11 Abstract
12
13 This document is one of a collection that, together, describe the
14 protocol and usage context for a revision of Internationalized Domain
15 Names for Applications (IDNA), superseding the earlier version. It
16 describes the document collection and provides definitions and other
17 material that are common to the set.
18
19 Status of This Memo
20
21 This is an Internet Standards Track document.
22
23 This document is a product of the Internet Engineering Task Force
24 (IETF). It represents the consensus of the IETF community. It has
25 received public review and has been approved for publication by the
26 Internet Engineering Steering Group (IESG). Further information on
27 Internet Standards is available in Section 2 of RFC 5741.
28
29 Information about the current status of this document, any errata,
30 and how to provide feedback on it may be obtained at
31 http://www.rfc-editor.org/info/rfc5890.
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52 Klensin Standards Track [Page 1]
53 RFC 5890 IDNA Definitions August 2010
54
55
56 Copyright Notice
57
58 Copyright (c) 2010 IETF Trust and the persons identified as the
59 document authors. All rights reserved.
60
61 This document is subject to BCP 78 and the IETF Trust's Legal
62 Provisions Relating to IETF Documents
63 (http://trustee.ietf.org/license-info) in effect on the date of
64 publication of this document. Please review these documents
65 carefully, as they describe your rights and restrictions with respect
66 to this document. Code Components extracted from this document must
67 include Simplified BSD License text as described in Section 4.e of
68 the Trust Legal Provisions and are provided without warranty as
69 described in the Simplified BSD License.
70
71 This document may contain material from IETF Documents or IETF
72 Contributions published or made publicly available before November
73 10, 2008. The person(s) controlling the copyright in some of this
74 material may not have granted the IETF Trust the right to allow
75 modifications of such material outside the IETF Standards Process.
76 Without obtaining an adequate license from the person(s) controlling
77 the copyright in such materials, this document may not be modified
78 outside the IETF Standards Process, and derivative works of it may
79 not be created outside the IETF Standards Process, except to format
80 it for publication as an RFC or to translate it into languages other
81 than English.
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107 Klensin Standards Track [Page 2]
108 RFC 5890 IDNA Definitions August 2010
109
110
111 Table of Contents
112
113 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
114 1.1. IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . . 4
115 1.1.1. Audiences . . . . . . . . . . . . . . . . . . . . . . 4
116 1.1.2. Normative Language . . . . . . . . . . . . . . . . . . 5
117 1.2. Road Map of IDNA2008 Documents . . . . . . . . . . . . . . 5
118 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 6
119 2.1. Characters and Character Sets . . . . . . . . . . . . . . 6
120 2.2. DNS-Related Terminology . . . . . . . . . . . . . . . . . 6
121 2.3. Terminology Specific to IDNA . . . . . . . . . . . . . . . 7
122 2.3.1. LDH Label . . . . . . . . . . . . . . . . . . . . . . 7
123 2.3.2. Terms for IDN Label Codings . . . . . . . . . . . . . 11
124 2.3.2.1. IDNA-valid strings, A-label, and U-label . . . . . 11
125 2.3.2.2. NR-LDH Label . . . . . . . . . . . . . . . . . . . 13
126 2.3.2.3. Internationalized Domain Name and
127 Internationalized Label . . . . . . . . . . . . . 13
128 2.3.2.4. Label Equivalence . . . . . . . . . . . . . . . . 14
129 2.3.2.5. ACE Prefix . . . . . . . . . . . . . . . . . . . . 14
130 2.3.2.6. Domain Name Slot . . . . . . . . . . . . . . . . . 14
131 2.3.3. Order of Characters in Labels . . . . . . . . . . . . 15
132 2.3.4. Punycode is an Algorithm, Not a Name or Adjective . . 15
133 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
134 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16
135 4.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 16
136 4.2. U-label Lengths . . . . . . . . . . . . . . . . . . . . . 16
137 4.3. Local Character Set Issues . . . . . . . . . . . . . . . . 17
138 4.4. Visually Similar Characters . . . . . . . . . . . . . . . 17
139 4.5. IDNA Lookup, Registration, and the Base DNS
140 Specifications . . . . . . . . . . . . . . . . . . . . . . 18
141 4.6. Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18
142 4.7. Security Differences from IDNA2003 . . . . . . . . . . . . 19
143 4.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20
144 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20
145 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
146 6.1. Normative References . . . . . . . . . . . . . . . . . . . 20
147 6.2. Informative References . . . . . . . . . . . . . . . . . . 21
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162 Klensin Standards Track [Page 3]
163 RFC 5890 IDNA Definitions August 2010
164
165
166 1. Introduction
167
168 1.1. IDNA2008
169
170 This document is one of a collection that, together, describe the
171 protocol and usage context for a revision of Internationalized Domain
172 Names for Applications (IDNA) that was largely completed in 2008,
173 known within the series and elsewhere as "IDNA2008". The series
174 replaces an earlier version of IDNA [RFC3490] [RFC3491]. For
175 convenience, that version of IDNA is referred to in these documents
176 as "IDNA2003". The newer version continues to use the Punycode
177 algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from
178 that earlier version. The document collection is described in
179 Section 1.2. As indicated there, this document provides definitions
180 and other material that are common to the set.
181
182 1.1.1. Audiences
183
184 While many IETF specifications are directed exclusively to protocol
185 implementers, the character of IDNA requires that it be understood
186 and properly used by those whose responsibilities include making
187 decisions about:
188
189 o what names are permitted in DNS zone files,
190
191 o policies related to names and naming, and
192
193 o the handling of domain name strings in files and systems, even
194 with no immediate intention of looking them up.
195
196 This document and those documents concerned with the protocol
197 definition, rules for handling strings that include characters
198 written right to left, and the actual list of characters and
199 categories will be of primary interest to protocol implementers.
200 This document and the one containing explanatory material will be of
201 primary interest to others, although they may have to fill in some
202 details by reference to other documents in the set.
203
204 This document and the associated ones are written from the
205 perspective of an IDNA-aware user, application, or implementation.
206 While they may reiterate fundamental DNS rules and requirements for
207 the convenience of the reader, they make no attempt to be
208 comprehensive about DNS principles and should not be considered as a
209 substitute for a thorough understanding of the DNS protocols and
210 specifications.
211
212
213
214
215
216
217 Klensin Standards Track [Page 4]
218 RFC 5890 IDNA Definitions August 2010
219
220
221 1.1.2. Normative Language
222
223 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
224 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
225 document are to be interpreted as described in RFC 2119 [RFC2119].
226
227 1.2. Road Map of IDNA2008 Documents
228
229 IDNA2008 consists of the following documents:
230
231 o This document, containing definitions and other material that are
232 needed for understanding other documents in the set. It is
233 referred to informally in other documents in the set as "Defs" or
234 "Definitions".
235
236 o A document, RFC 5894 [RFC5894], that provides an overview of the
237 protocol and associated tables together with explanatory material
238 and some rationale for the decisions that led to IDNA2008. That
239 document also contains advice for registry operations and those
240 who use Internationalized Domain Names (IDNs). It is referred to
241 informally in other documents in the set as "Rationale". It is
242 not normative.
243
244 o A document, RFC 5891 [RFC5891], that describes the core IDNA2008
245 protocol and its operations. In combination with the Bidi
246 document, described immediately below, it explicitly updates and
247 replaces RFC 3490. It is referred to informally in other
248 documents in the set as "Protocol".
249
250 o A document, RFC 5893 [RFC5893], that specifies special rules
251 (Bidi) for labels that contain characters that are written from
252 right to left.
253
254 o A specification, RFC 5892 [RFC5892], of the categories and rules
255 that identify the code points allowed in a label written in native
256 character form (defined more specifically as a "U-label" in
257 Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code
258 point assignments and additional rules unique to IDNA2008. The
259 Unicode-based rules are expected to be stable across Unicode
260 updates and hence independent of Unicode versions. That
261 specification obsoletes RFC 3941 and IDN use of the tables to
262 which it refers. It is referred to informally in other documents
263 in the set as "Tables".
264
265
266
267
268
269
270
271
272 Klensin Standards Track [Page 5]
273 RFC 5890 IDNA Definitions August 2010
274
275
276 o A document [IDNA2008-Mapping] that discusses the issue of mapping
277 characters into other characters and that provides guidance for
278 doing so when that is appropriate. That document, referred to
279 informally as "Mapping", provides advice; it is not a required
280 part of IDNA.
281
282 2. Definitions and Terminology
283
284 2.1. Characters and Character Sets
285
286 A code point is an integer value in the codespace of a coded
287 character set. In Unicode, these are integers from 0 to 0x10FFFF.
288
289 Unicode [Unicode52] is a coded character set containing somewhat over
290 100,000 characters assigned to code points as of version 5.2. A
291 single Unicode code point is denoted in these documents by "U+"
292 followed by four to six hexadecimal digits, while a range of Unicode
293 code points is denoted by two four to six digit hexadecimal numbers
294 separated by "..", with no prefixes.
295
296 ASCII means US-ASCII [ASCII], a coded character set containing 128
297 characters associated with code points in the range 0000..007F.
298 Unicode is a superset of ASCII and may be thought of as a
299 generalization of it; it includes all the ASCII characters and
300 associates them with the equivalent code points.
301
302 "Letters" are, informally, generalizations from the ASCII and
303 common-sense understanding of that term, i.e., characters that are
304 used to write text and that are not digits, symbols, or punctuation.
305 Formally, they are characters with a Unicode General Category value
306 starting in "L" (see Section 4.5 of The Unicode Standard
307 [Unicode52]).
308
309 2.2. DNS-Related Terminology
310
311 When discussing the DNS, this document generally assumes the
312 terminology used in the DNS specifications [RFC1034] [RFC1035] as
313 subsequently modified [RFC1123] [RFC2181]. The term "lookup" is used
314 to describe the combination of operations performed by the IDNA2008
315 protocol and those actually performed by a DNS resolver. The process
316 of placing an entry into the DNS is referred to as "registration".
317 This is similar to common contemporary usage of that term in other
318 contexts. Consequently, any DNS zone administration is described as
319 a "registry", and the terms "registry" and "zone administrator" are
320 used interchangeably, regardless of the actual administrative
321 arrangements or level in the DNS tree. More details about that
322 relationship are included in the Rationale document.
323
324
325
326
327 Klensin Standards Track [Page 6]
328 RFC 5890 IDNA Definitions August 2010
329
330
331 The term "LDH code point" is defined in this document to refer to the
332 code points associated with ASCII letters (Unicode code points
333 0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus
334 (U+002D). "LDH" is an abbreviation for "letters, digits, hyphen" but
335 is used specifically in this document to refer to the set of naming
336 rules described in Section 2.3.1 below.
337
338 The base DNS specifications [RFC1034] [RFC1035] discuss "domain
339 names" and "hostnames", but many people use the terms
340 interchangeably, as do sections of these specifications. Lack of
341 clarity about that terminology has contributed to confusion about
342 intent in some cases. These documents generally use the term "domain
343 name". When they refer to, e.g., hostname syntax restrictions, they
344 explicitly cite the relevant defining documents. The remaining
345 definitions in this subsection are essentially a review: if there is
346 any perceived difference between those definitions and the
347 definitions in the base DNS documents or those cited below, the
348 definitions in the other documents take precedence.
349
350 A label is an individual component of a domain name. Labels are
351 usually shown separated by dots; for example, the domain name
352 "www.example.com" is composed of three labels: "www", "example", and
353 "com". (The complete name convention using a trailing dot described
354 in RFC 1123 [RFC1123], which can be explicit as in "www.example.com."
355 or implicit as in "www.example.com", is not considered in this
356 specification.) IDNA extends the set of usable characters in labels
357 that are treated as text (as distinct from the binary string labels
358 discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones
359 [RFC2673]), but only in certain contexts. The different contexts for
360 different sets of usable characters are outlined in the next section.
361 For the rest of this document and in the related ones, the term
362 "label" is shorthand for "text label", and "every label" means "every
363 text label", including the expanded context.
364
365 2.3. Terminology Specific to IDNA
366
367 This section defines some terminology to reduce dependence on terms
368 and definitions that have been problematic in the past. The
369 relationships among these definitions are illustrated in Figure 1 and
370 Figure 2. In the first of those figures, the parenthesized numbers
371 refer to the notes below the figure.
372
373 2.3.1. LDH Label
374
375 This is the classical label form used, albeit with some additional
376 restrictions, in hostnames [RFC0952]. Its syntax is identical to
377 that described as the "preferred name syntax" in Section 3.5 of RFC
378 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. Briefly, it is a
379
380
381
382 Klensin Standards Track [Page 7]
383 RFC 5890 IDNA Definitions August 2010
384
385
386 string consisting of ASCII letters, digits, and the hyphen with the
387 further restriction that the hyphen cannot appear at the beginning or
388 end of the string. Like all DNS labels, its total length must not
389 exceed 63 octets.
390
391 LDH labels include the specialized labels used by IDNA (described as
392 "A-labels" below) and some additional restricted forms (also
393 described below).
394
395 To facilitate clear description, two new subsets of LDH labels are
396 created by the introduction of IDNA. These are called Reserved LDH
397 labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
398 Reserved LDH labels, known as "tagged domain names" in some other
399 contexts, have the property that they contain "--" in the third and
400 fourth characters but which otherwise conform to LDH label rules.
401 Only a subset of the R-LDH labels can be used in IDNA-aware
402 applications. That subset consists of the class of labels that begin
403 with the prefix "xn--" (case independent), but otherwise conform to
404 the rules for LDH labels. That subset is called "XN-labels" in this
405 set of documents. XN-labels are further divided into those whose
406 remaining characters (after the "xn--") are valid output of the
407 Punycode algorithm [RFC3492] and those that are not (see below). The
408 XN-labels that are valid Punycode output are known as "A-labels" if
409 they also meet the other criteria for IDNA-validity described below.
410 Because LDH labels (and, indeed, any DNS label) must not be more than
411 63 octets in length, the portion of an XN-label derived from the
412 Punycode algorithm is limited to no more than 59 ASCII characters.
413 Non-Reserved LDH labels are the set of valid LDH labels that do not
414 have "--" in the third and fourth positions.
415
416 A consequence of the restrictions on valid characters in the native
417 Unicode character form (see U-labels) turns out to be that mixed-case
418 annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492],
419 is never useful. Therefore, since a valid A-label is the result of
420 Punycode encoding of a U-label, A-labels should be produced only in
421 lowercase, despite matching other (mixed-case or uppercase) potential
422 labels in the DNS.
423
424 Some strings that are prefixed with "xn--" to form labels may not be
425 the output of the Punycode algorithm, may fail the other tests
426 outlined below, or may violate other IDNA restrictions and thus are
427 also not valid IDNA labels. They are called "Fake A-labels" for
428 convenience.
429
430 Labels within the class of R-LDH labels that are not prefixed with
431 "xn--" are also not valid IDNA labels. To allow for future use of
432 mechanisms similar to IDNA, those labels MUST NOT be processed as
433
434
435
436
437 Klensin Standards Track [Page 8]
438 RFC 5890 IDNA Definitions August 2010
439
440
441 ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be
442 mixed with IDNA labels in the same zone.
443
444 These distinctions among possible LDH labels are only of significance
445 for software that is IDNA-aware or for future extensions that use
446 extensions based on the same "prefix and encoding" model. For
447 IDNA-aware systems, the valid label types are: A-labels, U-labels,
448 and NR-LDH labels.
449
450 IDNA labels come in two flavors: an ACE-encoded form and a Unicode
451 (native character) form. These are referred to as A-labels and
452 U-labels, respectively, and are described in detail in the next
453 section.
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492 Klensin Standards Track [Page 9]
493 RFC 5890 IDNA Definitions August 2010
494
495
496 ASCII Label
497 __________________________________________________________________
498 | |
499 | ____________________ LDH Label (1) (4) ________________ |
500 | | ___________________________________ | |
501 | | |IDN Reserved LDH Labels | | |
502 | | | ("??--") or R-LDH Labels | _______________ | |
503 | | | _______________________________ | |NON-RESERVED | | |
504 | | | | XN-labels | | | LDH Labels | | |
505 | | | | _____________ ___________ | | | (NR-LDH | | |
506 | | | | | A-labels | | Fake (3) || | | labels) | | |
507 | | | | | "xn--"(2) | | A-labels || | |_____________| | |
508 | | | | |___________| |__________|| | | |
509 | | | |_____________________________| | | |
510 | | |_________________________________| | |
511 | |_______________________________________________________| |
512 | |
513 | _____________NON-LDH label________ |
514 | | ______________________ | |
515 | | | Underscore labels | | |
516 | | | e.g., _tcp | | |
517 | | |____________________| | |
518 | | | Labels with leading| | |
519 | | | or trailing | | |
520 | | | hyphens "-abcd" | | |
521 | | | or "xyz-" | | |
522 | | | or "-uvw-" | | |
523 | | |____________________| | |
524 | | | Labels with other | | |
525 | | | non-LDH ASCII chars| | |
526 | | | e.g., #$%_ | | |
527 | | |____________________| | |
528 | |________________________________| |
529 |________________________________________________________________|
530
531 (1) ASCII letters (uppercase and lowercase), digits,
532 hyphen. Hyphen may not appear in first or last
533 position. No more than 63 octets.
534 (2) Note that the string following "xn--" must
535 be the valid output of the Punycode algorithm
536 and must be convertible into valid U-label form.
537 (3) Note that a Fake A-label has a prefix "xn--"
538 but the remainder of the label is NOT the valid
539 output of the Punycode algorithm.
540 (4) LDH label subtypes are indistinguishable to
541 applications that are not IDNA-aware.
542
543 Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels
544
545
546
547 Klensin Standards Track [Page 10]
548 RFC 5890 IDNA Definitions August 2010
549
550
551 __________________________
552 | Non-ASCII |
553 | |
554 | ___________________ |
555 | | U-label (5) | |
556 | |_________________| |
557 | | | |
558 | | Binary Label | |
559 | | (including | |
560 | | high bit on) | |
561 | |_________________| |
562 | | | |
563 | | Bit String | |
564 | | Label | |
565 | |_________________| |
566 |________________________|
567
568 (5) To applications that are not IDNA-aware, U-labels
569 are indistinguishable from Binary ones.
570
571 Figure 2: Non-ASCII Labels
572
573 2.3.2. Terms for IDN Label Codings
574
575 2.3.2.1. IDNA-valid strings, A-label, and U-label
576
The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.
This RFC is included in the DNS RFCs annotation project whose home page is here.
Request for Comments: 5890 Obsoletes: 3490 Category: Standards Track
Request for Comments: 5890 Obsoletes: 3490 Updates: 4343 Category: Standards Track
I have no idea whether this correction is Editorial or Technical , nor what to use as a Section indication. However... RFC 5890 (or IDNA2008 more generally), should have updated RFC 4343 and the IDN discussion in its Section 5. The latter references the IDNA2003 documents and makes some statements that are, at best, confusing in the context of IDNA2008. See the extended notes for RFC 4343 in https://www.rfc- editor.org/errata/eid7290 for more discussion and details. Recommendation: Hold for document update unless this appears to anyone to be a serious problem, in which case a separate RFC, using the notes on Errata ID 7290 as a starting point, may be in order. [AD Note:] Marking this as Verified, and will direct the RFC Editor to update the metadata about both documents.
577 For IDNA-aware applications, the three types of valid labels are
578 "A-labels", "U-labels", and "NR-LDH labels", each of which is defined
579 below. The relationships among them are illustrated in Figure 1 and
580 Figure 2.
581
582 o A string is "IDNA-valid" if it meets all of the requirements of
583 these specifications for an IDNA label. IDNA-valid strings may
584 appear in either of the two forms defined immediately below, or
585 may be drawn from the NR-LDH label subset. IDNA-valid strings
586 must also conform to all basic DNS requirements for labels. These
587 documents make specific reference to the form appropriate to any
588 context in which the distinction is important.
589
590 o An "A-label" is the ASCII-Compatible Encoding (ACE, see
591 Section 2.3.2.5) form of an IDNA-valid string. It must be a
592 complete label: IDNA is defined for labels, not for parts of them
593 and not for complete domain names. This means, by definition,
594 that every A-label will begin with the IDNA ACE prefix, "xn--"
595 (see Section 2.3.2.5), followed by a string that is a valid output
596 of the Punycode algorithm [RFC3492] and hence a maximum of 59
597 ASCII characters in length. The prefix and string together must
598 conform to all requirements for a label that can be stored in the
599
600
601
602 Klensin Standards Track [Page 11]
603 RFC 5890 IDNA Definitions August 2010
604
605
606 DNS including conformance to the rules for LDH labels
607 (Section 2.3.1). If and only if a string meeting the above
608 requirements can be decoded into a U-label is it an A-label.
609
610 o A "U-label" is an IDNA-valid string of Unicode characters, in
611 Normalization Form C (NFC) and including at least one non-ASCII
612 character, expressed in a standard Unicode Encoding Form (such as
613 UTF-8). It is also subject to the constraints about permitted
614 characters that are specified in Section 4.2 of the Protocol
615 document and the rules in the Sections 2 and 3 of the Tables
616 document, the Bidi constraints in that document if it contains any
617 character from scripts that are written right to left, and the
618 symmetry constraint described immediately below. Conversions
619 between U-labels and A-labels are performed according to the
620 "Punycode" specification [RFC3492], adding or removing the ACE
621 prefix as needed.
622
623 To be valid, U-labels and A-labels must obey an important symmetry
624 constraint. While that constraint may be tested in any of several
625 ways, an A-label A1 must be capable of being produced by conversion
626 from a U-label U1, and that U-label U1 must be capable of being
627 produced by conversion from A-label A1. Among other things, this
628 implies that both U-labels and A-labels must be strings in Unicode
629 NFC [Unicode-UAX15] normalized form. These strings MUST contain only
630 characters specified elsewhere in this document series, and only in
631 the contexts indicated as appropriate.
632
633 Any rules or conventions that apply to DNS labels in general apply to
634 whichever of the U-label or A-label would be more restrictive. There
635 are two exceptions to this principle. First, the restriction to
For IDNA-aware applications, the three types of valid labels are "A-labels", "U-labels", and "NR-LDH labels", each of which is defined below.
For IDNA-aware applications, the three types of valid labels are "A-labels", "U-labels", and "NR-LDH labels", each of which is defined below and in section 2.3.1.
636 ASCII characters does not apply to the U-label. Second, expansion of
637 the A-label form to a U-label may produce strings that are much
638 longer than the normal 63 octet DNS limit (potentially up to 252
639 characters) due to the compression efficiency of the Punycode
640 algorithm. Such extended-length U-labels are valid from the
641 standpoint of IDNA, but caution should be exercised as shorter limits
642 may be imposed by some applications.
643
644 For context, applications that are not IDNA-aware treat all LDH
645 labels as valid for appearance in DNS zone files and queries and some
646 of them may permit additional types of labels (i.e., not impose the
647 LDH restriction). IDNA-aware applications permit only A-labels and
648 NR-LDH labels to appear in zone files and queries. U-labels can
649 appear, along with the other two, in presentation and user interface
650 forms, and in protocols that use IDNA forms but that do not involve
651 the DNS itself.
652
653
654
655
656
657 Klensin Standards Track [Page 12]
658 RFC 5890 IDNA Definitions August 2010
659
660
661 Specifically, for IDNA-aware applications and contexts, the three
662 allowed categories are A-label, U-label, and NR-LDH label. Of the
663 Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA
664 use.
665
666 Strings that appear to be A-labels or U-labels are processed in
667 various operations of the Protocol document [RFC5891]. Those strings
668 are not yet demonstrably conformant with the conditions outlined
669 above because they are in the process of validation. Such strings
670 may be referred to as "unvalidated", "putative", or "apparent", or as
671 being "in the form of" one of the label types to indicate that they
672 have not been verified to meet the specified conformance
673 requirements.
674
675 Unvalidated A-labels are known only to be XN-labels, while Fake
676 A-labels have been demonstrated to fail some of the A-label tests.
677 Similarly, unvalidated U-labels are simply non-ASCII labels that may
678 or may not meet the requirements for U-labels.
679
680 2.3.2.2. NR-LDH Label
681
682 These specifications use the term "NR-LDH label" strictly to refer to
683 an all-ASCII label that obeys the LDH label syntax discussed in
684 Section 2.3.1 and that is neither an IDN nor a label form reserved by
685 IDNA (R-LDH label). It should be stressed that all A-labels obey the
686 "hostname" [RFC0952] rules other than the length restriction in those
687 rules.
688
689 2.3.2.3. Internationalized Domain Name and Internationalized Label
690
691 An "internationalized domain name" (IDN) is a domain name that
692 contains at least one A-label or U-label, but that otherwise may
693 contain any mixture of NR-LDH labels, A-labels, or U-labels. Just as
694 has been the case with ASCII names, some DNS zone administrators may
695 impose restrictions, beyond those imposed by DNS or IDNA, on the
696 characters or strings that may be registered as labels in their
697 zones. Because of the diversity of characters that can be used in a
698 U-label and the confusion they might cause, such restrictions are
699 mandatory for IDN registries and zones even though the particular
700 restrictions are not part of these specifications (the issue is
701 discussed in more detail in Section 4.3 of the Protocol document
702 [RFC5891]. Because these restrictions, commonly known as "registry
703 restrictions", only affect what can be registered and not lookup
704 processing, they have no effect on the syntax or semantics of DNS
705 protocol messages; a query for a name that matches no records will
706 yield the same response regardless of the reason why it is not in the
707 zone. Clients issuing queries or interpreting responses cannot be
708
709
710
711
712 Klensin Standards Track [Page 13]
713 RFC 5890 IDNA Definitions August 2010
714
715
716 assumed to have any knowledge of zone-specific restrictions or
717 conventions. See the section on registration policy in the Rationale
718 document [RFC5894] for additional discussion.
719
720 "Internationalized label" is used when a term is needed to refer to a
721 single label of an IDN, i.e., one that might be any of an NR-LDH
722 label, A-label, or U-label. There are some standardized DNS label
723 formats, such as the "underscore labels" used for service location
724 (SRV) records [RFC2782], that do not fall into any of the three
725 categories and hence are not internationalized labels.
726
727 2.3.2.4. Label Equivalence
728
729 In IDNA, equivalence of labels is defined in terms of the A-labels.
730 If the A-labels are equal in a case-independent comparison, then the
731 labels are considered equivalent, no matter how they are represented.
732 Because of the isomorphism of A-labels and U-labels in IDNA2008, it
733 is possible to compare U-labels directly; see the Protocol document
734 [RFC5891] for details. Traditional LDH labels already have a notion
735 of equivalence: within that list of characters, uppercase and
736 lowercase are considered equivalent. The IDNA notion of equivalence
737 is an extension of that older notion but, because the protocol does
738 not specify any mandatory mapping and only those isomorphic forms are
739 considered, the only equivalents are:
740
741 o Exact (bit-string identity) matches between a pair of U-labels.
742
743 o Matches between a pair of A-labels, using normal DNS
744 case-insensitive matching rules.
745
746 o Equivalence between a U-label and an A-label determined by
747 translating the U-label form into an A-label form and then testing
748 for a match between the A-labels using normal DNS case-insensitive
749 matching rules.
750
751 2.3.2.5. ACE Prefix
752
753 The "ACE prefix" is defined in this document to be a string of ASCII
754 characters, "xn--", that appears at the beginning of every A-label.
755 "ACE" stands for "ASCII-Compatible Encoding".
756
757 2.3.2.6. Domain Name Slot
758
759 A "domain name slot" is defined in this document to be a protocol
760 element or a function argument or a return value (and so on)
761 explicitly designated for carrying a domain name. Examples of domain
762 name slots include the QNAME field of a DNS query; the name argument
763 of the gethostbyname() or getaddrinfo() standard C library functions;
764
765
766
767 Klensin Standards Track [Page 14]
768 RFC 5890 IDNA Definitions August 2010
769
770
771 the part of an email address following the at sign ("@") in the
772 parameter to the SMTP MAIL or RCPT commands or the "From:" field of
773 an email message header; and the host portion of the URI in the "src"
774 attribute of an HTML "<IMG>" tag. A string that has the syntax of a
775 domain name but that appears in general text is not in a domain name
776 slot. For example, a domain name appearing in the plain text body of
777 an email message is not occupying a domain name slot.
778
779 An "IDNA-aware domain name slot" is defined for this set of documents
780 to be a domain name slot explicitly designated for carrying an
781 internationalized domain name as defined in this document. The
782 designation may be static (for example, in the specification of the
783 protocol or interface) or dynamic (for example, as a result of
784 negotiation in an interactive session).
785
786 Name slots that are not IDNA-aware obviously include any domain name
787 slot whose specification predates IDNA. Note that the requirements
788 of some protocols that use the DNS for data storage prevent the use
789 of IDNs. For example, the format required for the underscore labels
790 used by the service location protocol [RFC2782] precludes
791 representation of a non-ASCII label in the DNS using A-labels because
792 those SRV-related labels must start with underscores. Of course,
793 non-ASCII IDN labels may be part of a domain name that also includes
794 underscore labels.
795
796 2.3.3. Order of Characters in Labels
797
798 Because IDN labels may contain characters that are read, and
799 preferentially displayed, from right to left, there is a potential
800 ambiguity about which character in a label is "first". For the
801 purposes of these specifications, labels are considered, and
802 characters numbered, strictly in the order in which they appear "on
803 the wire". That order is equivalent to the leftmost character being
804 treated as first in a label that is read left to right and to the
805 rightmost character being first in a label that is read right to
806 left. The Bidi specification contains additional discussion of the
807 conditions that influence reading order.
808
809 2.3.4. Punycode is an Algorithm, Not a Name or Adjective
810
811 There has been some confusion about whether a "Punycode string" does
812 or does not include the ACE prefix and about whether it is required
813 that such strings could have been the output of the ToASCII operation
814 (see RFC 3490, Section 4 [RFC3490]). This specification discourages
815 the use of the term "Punycode" to describe anything but the encoding
816 method and algorithm of RFC 3492 [RFC3492]. The terms defined above
817 are preferred as much more clear than the term "Punycode string".
818
819
820
821
822 Klensin Standards Track [Page 15]
823 RFC 5890 IDNA Definitions August 2010
824
825
826 3. IANA Considerations
827
828 IANA actions for this version of IDNA (IDNA2008) are specified in the
829 Tables document [RFC5892]. An overview of the relationships among
830 the various IANA registries appears in the Rationale document
831 [RFC5894]. This document does not specify any actions for IANA.
832
833 4. Security Considerations
834
835 4.1. General Issues
836
837 Security on the Internet partly relies on the DNS. Thus, any change
838 to the characteristics of the DNS can change the security of much of
839 the Internet.
840
841 Domain names are used by users to identify and connect to Internet
842 hosts and other network resources. The security of the Internet is
843 compromised if a user entering a single internationalized name is
844 connected to different servers based on different interpretations of
845 the internationalized domain name. In addition to characters that
846 are permitted by IDNA2003 and its mapping conventions (see
847 Section 4.6), the current specification changes the interpretation of
848 a few characters that were mapped to others in the earlier version;
849 zone administrators should be aware of the problems that this might
850 raise and take appropriate measures. The context for this issue is
851 discussed in more detail in the Rationale document [RFC5894].
852
853 In addition to the Security Considerations material that appears in
854 this document, the Bidi document [RFC5893] contains a discussion of
855 security issues specific to labels containing characters from scripts
856 that are normally written right to left.
857
858 4.2. U-label Lengths
859
860 Labels associated with the DNS have traditionally been limited to 63
861 octets by the general restrictions in RFC 1035 and by the need to
862 treat them as a six-bit string length followed by the string in
863 actual calls to the DNS. That format is used in some other
864 applications and, in general, that representations of domain names as
865 dot-separated labels and as length-string pairs have been treated as
expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (potentially up to 252 characters) ^^^^^^^^^
expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (potentially up to 252charactersoctets) ^^^^^
expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (potentially up to 252 characters)
expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (potentially up to252 characters59 Unicode code points or 236 octets)
866 interchangeable. Because A-labels (the form actually used in the
867 DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
868 in general, more compressed that UTF-16 or UTF-32), U-labels that
869 obey all of the relevant symmetry (and other) constraints of these
870 documents may be quite a bit longer, potentially up to 252 characters
871 (Unicode code points). A fully-qualified domain name containing
872 several such labels can obviously also exceed the nominal 255 octet
873
874
875
876
877 Klensin Standards Track [Page 16]
878 RFC 5890 IDNA Definitions August 2010
879
880
881 limit for such names. Application authors using U-labels must exert
882 due caution to avoid buffer overflow and truncation errors and
883 attacks in contexts where shorter strings are expected.
884
885 4.3. Local Character Set Issues
886
887 When systems use local character sets other than ASCII and Unicode,
888 these specifications leave the problem of converting between the
889 local character set and Unicode up to the application or local
890 system. If different applications (or different versions of one
891 application) implement different rules for conversions among coded
892 character sets, they could interpret the same name differently and
893 contact different servers. This problem is not solved by security
894 protocols, such as Transport Layer Security (TLS) [RFC5246], that do
895 not take local character sets into account.
896
897 4.4. Visually Similar Characters
898
899 To help prevent confusion between characters that are visually
900 similar (sometimes called "confusables"), it is suggested that
901 implementations provide visual indications where a domain name
902 contains multiple scripts, especially when the scripts contain
903 characters that are easily confused visually, such as an omicron in
904 Greek mixed with Latin text. Such mechanisms can also be used to
905 show when a name contains a mixture of Simplified Chinese characters
906 with Traditional ones that have Simplified forms, or to distinguish
907 zero and one from uppercase "O" and lowercase "L". DNS zone
908 administrators may impose restrictions (subject to the limitations
909 identified elsewhere in these documents) that try to minimize
910 characters that have similar appearance or similar interpretations.
911
912 If multiple characters appear in a label and the label consists only
913 of characters in one script, individual characters that might be
914 confused with others if compared separately may be unambiguous and
915 non-confusing. On the other hand, that observation makes labels
916 containing characters from more than one script (often called "mixed-
917 script labels") even more risky -- users will tend to see what they
918 expect to see and context is a powerful reinforcement to perception.
919 At the same time, while the risks associated with mixed-script labels
920 are clear, simply prohibiting them will not eliminate problems,
921 especially where closely related scripts are involved. For example,
922 there are many strings that are entirely in Greek or Cyrillic scripts
923 that can be confused with each other or with Latin script strings.
924
925 It is worth noting that there are no comprehensive technical
926 solutions to the problems of confusable characters. One can reduce
927 the extent of the problems in various ways, but probably never
928
929
930
931
932 Klensin Standards Track [Page 17]
933 RFC 5890 IDNA Definitions August 2010
934
935
936 eliminate it. Some specific suggestions about identification and
937 handling of confusable characters appear in a Unicode Consortium
938 publication [Unicode-UTR36].
939
940 4.5. IDNA Lookup, Registration, and the Base DNS Specifications
941
942 The Protocol specification [RFC5891] describes procedures for
943 registering and looking up labels that are not compatible with the
944 preferred syntax described in the base DNS specifications (see
945 Section 2.3.1) because they contain non-ASCII characters. These
946 procedures depend on the use of a special ASCII-compatible encoding
947 form that contains only characters permitted in hostnames by those
948 earlier specifications. The encoding used is Punycode [RFC3492]. No
949 security issues such as string length increases or new allowed values
950 are introduced by the encoding process or the use of these encoded
951 values, apart from those introduced by the ACE encoding itself.
952
953 Domain names (or portions of them) are sometimes compared against a
954 set of domains to be given special treatment if a match occurs, e.g.,
955 treated as more privileged than others or blocked in some way. In
956 such situations, it is especially important that the comparisons be
957 done properly, as specified in the "Requirements" section of the
958 Protocol document [RFC5891]. For labels already in ASCII form, the
959 proper comparison reduces to the same case-insensitive ASCII
960 comparison that has always been used for ASCII labels although
961 IDNA-aware applications are expected to look up only A-labels and
962 NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not
963 A-labels.
964
965 The introduction of IDNA meant that any existing labels that start
966 with the ACE prefix would be construed as A-labels, at least until
967 they failed one of the relevant tests, whether or not that was the
968 intent of the zone administrator or registrant. There is no evidence
969 that this has caused any practical problems since RFC 3490 was
970 adopted, but the risk still exists in principle.
971
972 4.6. Legacy IDN Label Strings
973
974 The URI Standard [RFC3986] and a number of application specifications
975 (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII
976 labels in DNS names used with those protocols, i.e., only the A-label
977 form of IDNs is permitted in those contexts. If only A-labels are
978 used, differences in interpretation between IDNA2003 and this version
979 arise only for characters whose interpretation have actually changed
980 (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing
981 in IDNA2003 and that are considered legitimate in some contexts by
982 these specifications). Despite that prohibition, there are a
983 significant number of files and databases on the Internet in which
984
985
986
987 Klensin Standards Track [Page 18]
988 RFC 5890 IDNA Definitions August 2010
989
990
991 domain name strings appear in native-character form; a subset of
992 those strings use native-character labels that require IDNA2003
993 mapping to produce valid A-labels. The treatment of such labels will
994 vary by types of applications and application-designer preference: in
995 some situations, warnings to the user or outright rejection may be
996 appropriate; in others, it may be preferable to attempt to apply the
997 earlier mappings if lookup strictly conformant to these
998 specifications fails or even to do lookups under both sets of rules.
999 This general situation is discussed in more detail in the Rationale
1000 document [RFC5894]. However, in the absence of care by registries
1001 about how strings that could have different interpretations under
1002 IDNA2003 and the current specification are handled, it is possible
1003 that the differences could be used as a component of name-matching or
1004 name-confusion attacks. Such care is therefore appropriate.
1005
1006 4.7. Security Differences from IDNA2003
1007
1008 The registration and lookup models described in this set of documents
1009 change the mechanisms available for lookup applications to determine
1010 the validity of labels they encounter. In some respects, the ability
1011 to test is strengthened. For example, putative labels that contain
1012 unassigned code points will now be rejected, while IDNA2003 permitted
1013 them (see the Rationale document [RFC5894] for a discussion of the
1014 reasons for this). On the other hand, the Protocol specification no
1015 longer assumes that the application that looks up a name will be able
1016 to determine, and apply, information about the protocol version used
1017 in registration. In theory, that may increase risk since the
1018 application will be able to do less pre-lookup validation. In
1019 practice, the protection afforded by that test has been largely
1020 illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in
1021 these documents.
1022
1023 Any change to the Stringprep [RFC3454] procedure that is profiled and
1024 used in IDNA2003, or, more broadly, the IETF's model of the use of
1025 internationalized character strings in different protocols, creates
1026 some risk of inadvertent changes to those protocols, invalidating
1027 deployed applications or databases, and so on. But these
1028 specifications do not change Stringprep at all; they merely bypass
1029 it. Because these documents do not depend on Stringprep, the
1030 question of upgrading other protocols that do have that dependency
1031 can be left to experts on those protocols: the IDNA changes and
1032 possible upgrades to security protocols or conventions are
1033 independent issues.
1034
1035
1036
1037
1038
1039
1040
1041
1042 Klensin Standards Track [Page 19]
1043 RFC 5890 IDNA Definitions August 2010
1044
1045
1046 4.8. Summary
1047
1048 No mechanism involving names or identifiers alone can protect against
1049 a wide variety of security threats and attacks that are largely
1050 independent of the naming or identification system. These attacks
1051 include spoofed pages, DNS query trapping and diversion, and so on.
1052
1053 5. Acknowledgments
1054
1055 The initial version of this document was created largely by
1056 extracting text from early draft versions of the Rationale document
1057 [RFC5894]. See the section of this name and the one entitled
1058 "Contributors", in it.
1059
1060 Specific textual suggestions after the extraction process came from
1061 Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken
1062 Whistler. Other changes were made in response to more general
1063 comments, lists of concerns or specific errors from participants in
1064 the Working Group and other observers, including Lyman Chapin, James
1065 Mitchell, Subramanian Moonesamy, and Dan Winship.
1066
1067 6. References
1068
1069 6.1. Normative References
1070
1071 [ASCII] American National Standards Institute (formerly United
1072 States of America Standards Institute), "USA Code for
1073 Information Interchange", ANSI X3.4-1968, 1968. ANSI
1074 X3.4-1968 has been replaced by newer versions with
1075 slight modifications, but the 1968 version remains
1076 definitive for the Internet.
1077
1078 [RFC1034] Mockapetris, P., "Domain names - concepts and
1079 facilities", STD 13, RFC 1034, November 1987.
1080
1081 [RFC1035] Mockapetris, P., "Domain names - implementation and
1082 specification", STD 13, RFC 1035, November 1987.
1083
1084 [RFC1123] Braden, R., "Requirements for Internet Hosts -
1085 Application and Support", STD 3, RFC 1123, October 1989.
1086
1087 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1088 Requirement Levels", BCP 14, RFC 2119, March 1997.
1089
1090
1091
1092
1093
1094
1095
1096
1097 Klensin Standards Track [Page 20]
1098 RFC 5890 IDNA Definitions August 2010
1099
1100
1101 [Unicode-UAX15]
1102 The Unicode Consortium, "Unicode Standard Annex #15:
1103 Unicode Normalization Forms, Revision 31",
1104 September 2009,
1105 <http://www.unicode.org/reports/tr15/tr15-31.html>.
1106
1107 [Unicode52] The Unicode Consortium. The Unicode Standard, Version
1108 5.2.0, defined by: "The Unicode Standard, Version
1109 5.2.0", (Mountain View, CA: The Unicode Consortium,
1110 2009. ISBN 978-1-936213-00-9).
1111 <http://www.unicode.org/versions/Unicode5.2.0/>.
1112
1113 6.2. Informative References
1114
1115 [IDNA2008-Mapping]
1116 Resnick, P. and P. Hoffman, "Mapping Characters in
1117 Internationalized Domain Names for Applications (IDNA)",
1118 Work in Progress, April 2010.
1119
1120 [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD
1121 Internet host table specification", RFC 952,
1122 October 1985.
1123
1124 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
1125 Specification", RFC 2181, July 1997.
1126
1127 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1128 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
1129 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
1130
1131 [RFC2673] Crawford, M., "Binary Labels in the Domain Name System",
1132 RFC 2673, August 1999.
1133
1134 [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
1135 specifying the location of services (DNS SRV)",
1136 RFC 2782, February 2000.
1137
1138 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
1139 Internationalized Strings ("stringprep")", RFC 3454,
1140 December 2002.
1141
1142 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
1143 "Internationalizing Domain Names in Applications
1144 (IDNA)", RFC 3490, March 2003.
1145
1146 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
1147 Profile for Internationalized Domain Names (IDN)",
1148 RFC 3491, March 2003.
1149
1150
1151
1152 Klensin Standards Track [Page 21]
1153 RFC 5890 IDNA Definitions August 2010
1154
1155
1156 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
1157 Unicode for Internationalized Domain Names in
1158 Applications (IDNA)", RFC 3492, March 2003.
1159
1160 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1161 Resource Identifier (URI): Generic Syntax", STD 66,
1162 RFC 3986, January 2005.
1163
1164 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
1165 and Recommendations for Internationalized Domain Names
1166 (IDNs)", RFC 4690, September 2006.
1167
1168 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer
1169 Security (TLS) Protocol Version 1.2", RFC 5246,
1170 August 2008.
1171
1172 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
1173 October 2008.
1174
1175 [RFC5891] Klensin, J., "Internationalized Domain Names in
1176 Applications (IDNA): Protocol", RFC 5891, August 2010.
1177
1178 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
1179 Internationalized Domain Names for Applications (IDNA)",
1180 RFC 5892, August 2010.
1181
1182 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
1183 Internationalized Domain Names for Applications (IDNA)",
1184 RFC 5893, August 2010.
1185
1186 [RFC5894] Klensin, J., "Internationalized Domain Names for
1187 Applications (IDNA): Background, Explanation, and
1188 Rationale", RFC 5894, August 2010.
1189
1190 [Unicode-UTR36]
1191 The Unicode Consortium, "Unicode Technical Report #36:
1192 Unicode Security Considerations, Revision 7", July 2008,
1193 <http://www.unicode.org/reports/tr36/tr36-7.html>.
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207 Klensin Standards Track [Page 22]
1208 RFC 5890 IDNA Definitions August 2010
1209
1210
1211 Author's Address
1212
1213 John C Klensin
1214 1770 Massachusetts Ave, Ste 322
1215 Cambridge, MA 02140
1216 USA
1217
1218 Phone: +1 617 245 1457
1219 EMail: john+ietf@jck.com
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262 Klensin Standards Track [Page 23]
1263
Because A-labels (the form actually used in the DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, in general, more compressed that UTF-16 or UTF-32), U-labels that obey all of the relevant symmetry (and other) constraints of these documents may be quite a bit longer, potentially up to 252 characters (Unicode code points).
Because A-labels (the form actually used in the DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, in general, more compressed that UTF-16 or UTF-32), U-labels that obey all of the relevant symmetry (and other) constraints of these documents may be quite a bit longer, potentially up to 252characters (Unicode code points)octets.
Because A-labels (the form actually used in the DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, in general, more compressed that UTF-16 or UTF-32), U-labels that obey all of the relevant symmetry (and other) constraints of these documents may be quite a bit longer, potentially up to 252 characters (Unicode code points).
Because A-labels (the form actually used in the DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, in general, more compressed that UTF-16 or UTF-32), U-labels that obey all of the relevant symmetry (and other) constraints of these documents may be quite a bit longer, potentially up to252 characters (Unicode code points)59 Unicode code points, or up to 236 octets.