1 Internet Engineering Task Force (IETF)                        J. Klensin   
    2 Request for Comments: 5890                                   August 2010   
    3 Obsoletes: 3490                                                            
    4 Category: Standards Track                                                  
    5 ISSN: 2070-1721                                                            
    6                                                                            
    7                                                                            
    8         Internationalized Domain Names for Applications (IDNA):            
    9                    Definitions and Document Framework                      
   10                                                                            
   11 Abstract                                                                   
   12                                                                            
   13    This document is one of a collection that, together, describe the       
   14    protocol and usage context for a revision of Internationalized Domain   
   15    Names for Applications (IDNA), superseding the earlier version.  It     
   16    describes the document collection and provides definitions and other    
   17    material that are common to the set.                                    
   18                                                                            
   19 Status of This Memo                                                        
   20                                                                            
   21    This is an Internet Standards Track document.                           
   22                                                                            
   23    This document is a product of the Internet Engineering Task Force       
   24    (IETF).  It represents the consensus of the IETF community.  It has     
   25    received public review and has been approved for publication by the     
   26    Internet Engineering Steering Group (IESG).  Further information on     
   27    Internet Standards is available in Section 2 of RFC 5741.               
   28                                                                            
   29    Information about the current status of this document, any errata,      
   30    and how to provide feedback on it may be obtained at                    
   31    http://www.rfc-editor.org/info/rfc5890.                                 
   32                                                                            
   33                                                                            
   34                                                                            
   35                                                                            
   36                                                                            
   37                                                                            
   38                                                                            
   39                                                                            
   40                                                                            
   41                                                                            
   42                                                                            
   43                                                                            
   44                                                                            
   45                                                                            
   46                                                                            
   47                                                                            
   48                                                                            
   49                                                                            
   50                                                                            
   51                                                                            
   52 Klensin                      Standards Track                    [Page 1]   

   53 RFC 5890                    IDNA Definitions                 August 2010   
   54                                                                            
   55                                                                            
   56 Copyright Notice                                                           
   57                                                                            
   58    Copyright (c) 2010 IETF Trust and the persons identified as the         
   59    document authors.  All rights reserved.                                 
   60                                                                            
   61    This document is subject to BCP 78 and the IETF Trust's Legal           
   62    Provisions Relating to IETF Documents                                   
   63    (http://trustee.ietf.org/license-info) in effect on the date of         
   64    publication of this document.  Please review these documents            
   65    carefully, as they describe your rights and restrictions with respect   
   66    to this document.  Code Components extracted from this document must    
   67    include Simplified BSD License text as described in Section 4.e of      
   68    the Trust Legal Provisions and are provided without warranty as         
   69    described in the Simplified BSD License.                                
   70                                                                            
   71    This document may contain material from IETF Documents or IETF          
   72    Contributions published or made publicly available before November      
   73    10, 2008.  The person(s) controlling the copyright in some of this      
   74    material may not have granted the IETF Trust the right to allow         
   75    modifications of such material outside the IETF Standards Process.      
   76    Without obtaining an adequate license from the person(s) controlling    
   77    the copyright in such materials, this document may not be modified      
   78    outside the IETF Standards Process, and derivative works of it may      
   79    not be created outside the IETF Standards Process, except to format     
   80    it for publication as an RFC or to translate it into languages other    
   81    than English.                                                           
   82                                                                            
   83                                                                            
   84                                                                            
   85                                                                            
   86                                                                            
   87                                                                            
   88                                                                            
   89                                                                            
   90                                                                            
   91                                                                            
   92                                                                            
   93                                                                            
   94                                                                            
   95                                                                            
   96                                                                            
   97                                                                            
   98                                                                            
   99                                                                            
  100                                                                            
  101                                                                            
  102                                                                            
  103                                                                            
  104                                                                            
  105                                                                            
  106                                                                            
  107 Klensin                      Standards Track                    [Page 2]   

  108 RFC 5890                    IDNA Definitions                 August 2010   
  109                                                                            
  110                                                                            
  111 Table of Contents                                                          
  112                                                                            
  113    1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4   
  114      1.1.  IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . .  4   
  115        1.1.1.  Audiences  . . . . . . . . . . . . . . . . . . . . . .  4   
  116        1.1.2.  Normative Language . . . . . . . . . . . . . . . . . .  5   
  117      1.2.  Road Map of IDNA2008 Documents . . . . . . . . . . . . . .  5   
  118    2.  Definitions and Terminology  . . . . . . . . . . . . . . . . .  6   
  119      2.1.  Characters and Character Sets  . . . . . . . . . . . . . .  6   
  120      2.2.  DNS-Related Terminology  . . . . . . . . . . . . . . . . .  6   
  121      2.3.  Terminology Specific to IDNA . . . . . . . . . . . . . . .  7   
  122        2.3.1.  LDH Label  . . . . . . . . . . . . . . . . . . . . . .  7   
  123        2.3.2.  Terms for IDN Label Codings  . . . . . . . . . . . . . 11   
  124          2.3.2.1.  IDNA-valid strings, A-label, and U-label . . . . . 11   
  125          2.3.2.2.  NR-LDH Label . . . . . . . . . . . . . . . . . . . 13   
  126          2.3.2.3.  Internationalized Domain Name and                       
  127                    Internationalized Label  . . . . . . . . . . . . . 13   
  128          2.3.2.4.  Label Equivalence  . . . . . . . . . . . . . . . . 14   
  129          2.3.2.5.  ACE Prefix . . . . . . . . . . . . . . . . . . . . 14   
  130          2.3.2.6.  Domain Name Slot . . . . . . . . . . . . . . . . . 14   
  131        2.3.3.  Order of Characters in Labels  . . . . . . . . . . . . 15   
  132        2.3.4.  Punycode is an Algorithm, Not a Name or Adjective  . . 15   
  133    3.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16   
  134    4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16   
  135      4.1.  General Issues . . . . . . . . . . . . . . . . . . . . . . 16   
  136      4.2.  U-label Lengths  . . . . . . . . . . . . . . . . . . . . . 16   
  137      4.3.  Local Character Set Issues . . . . . . . . . . . . . . . . 17   
  138      4.4.  Visually Similar Characters  . . . . . . . . . . . . . . . 17   
  139      4.5.  IDNA Lookup, Registration, and the Base DNS                     
  140            Specifications . . . . . . . . . . . . . . . . . . . . . . 18   
  141      4.6.  Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18   
  142      4.7.  Security Differences from IDNA2003 . . . . . . . . . . . . 19   
  143      4.8.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . 20   
  144    5.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 20   
  145    6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20   
  146      6.1.  Normative References . . . . . . . . . . . . . . . . . . . 20   
  147      6.2.  Informative References . . . . . . . . . . . . . . . . . . 21   
  148                                                                            
  149                                                                            
  150                                                                            
  151                                                                            
  152                                                                            
  153                                                                            
  154                                                                            
  155                                                                            
  156                                                                            
  157                                                                            
  158                                                                            
  159                                                                            
  160                                                                            
  161                                                                            
  162 Klensin                      Standards Track                    [Page 3]   

  163 RFC 5890                    IDNA Definitions                 August 2010   
  164                                                                            
  165                                                                            
  166 1.  Introduction                                                           
  167                                                                            
  168 1.1.  IDNA2008                                                             
  169                                                                            
  170    This document is one of a collection that, together, describe the       
  171    protocol and usage context for a revision of Internationalized Domain   
  172    Names for Applications (IDNA) that was largely completed in 2008,       
  173    known within the series and elsewhere as "IDNA2008".  The series        
  174    replaces an earlier version of IDNA [RFC3490] [RFC3491].  For           
  175    convenience, that version of IDNA is referred to in these documents     
  176    as "IDNA2003".  The newer version continues to use the Punycode         
  177    algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from     
  178    that earlier version.  The document collection is described in          
  179    Section 1.2.  As indicated there, this document provides definitions    
  180    and other material that are common to the set.                          
  181                                                                            
  182 1.1.1.  Audiences                                                          
  183                                                                            
  184    While many IETF specifications are directed exclusively to protocol     
  185    implementers, the character of IDNA requires that it be understood      
  186    and properly used by those whose responsibilities include making        
  187    decisions about:                                                        
  188                                                                            
  189    o  what names are permitted in DNS zone files,                          
  190                                                                            
  191    o  policies related to names and naming, and                            
  192                                                                            
  193    o  the handling of domain name strings in files and systems, even       
  194       with no immediate intention of looking them up.                      
  195                                                                            
  196    This document and those documents concerned with the protocol           
  197    definition, rules for handling strings that include characters          
  198    written right to left, and the actual list of characters and            
  199    categories will be of primary interest to protocol implementers.        
  200    This document and the one containing explanatory material will be of    
  201    primary interest to others, although they may have to fill in some      
  202    details by reference to other documents in the set.                     
  203                                                                            
  204    This document and the associated ones are written from the              
  205    perspective of an IDNA-aware user, application, or implementation.      
  206    While they may reiterate fundamental DNS rules and requirements for     
  207    the convenience of the reader, they make no attempt to be               
  208    comprehensive about DNS principles and should not be considered as a    
  209    substitute for a thorough understanding of the DNS protocols and        
  210    specifications.                                                         
  211                                                                            
  212                                                                            
  213                                                                            
  214                                                                            
  215                                                                            
  216                                                                            
  217 Klensin                      Standards Track                    [Page 4]   

  218 RFC 5890                    IDNA Definitions                 August 2010   
  219                                                                            
  220                                                                            
  221 1.1.2.  Normative Language                                                 
  222                                                                            
  223    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",     
  224    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this    
  225    document are to be interpreted as described in RFC 2119 [RFC2119].      
  226                                                                            
  227 1.2.  Road Map of IDNA2008 Documents                                       
  228                                                                            
  229    IDNA2008 consists of the following documents:                           
  230                                                                            
  231    o  This document, containing definitions and other material that are    
  232       needed for understanding other documents in the set.  It is          
  233       referred to informally in other documents in the set as "Defs" or    
  234       "Definitions".                                                       
  235                                                                            
  236    o  A document, RFC 5894 [RFC5894], that provides an overview of the     
  237       protocol and associated tables together with explanatory material    
  238       and some rationale for the decisions that led to IDNA2008.  That     
  239       document also contains advice for registry operations and those      
  240       who use Internationalized Domain Names (IDNs).  It is referred to    
  241       informally in other documents in the set as "Rationale".  It is      
  242       not normative.                                                       
  243                                                                            
  244    o  A document, RFC 5891 [RFC5891], that describes the core IDNA2008     
  245       protocol and its operations.  In combination with the Bidi           
  246       document, described immediately below, it explicitly updates and     
  247       replaces RFC 3490.  It is referred to informally in other            
  248       documents in the set as "Protocol".                                  
  249                                                                            
  250    o  A document, RFC 5893 [RFC5893], that specifies special rules         
  251       (Bidi) for labels that contain characters that are written from      
  252       right to left.                                                       
  253                                                                            
  254    o  A specification, RFC 5892 [RFC5892], of the categories and rules     
  255       that identify the code points allowed in a label written in native   
  256       character form (defined more specifically as a "U-label" in          
  257       Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code        
  258       point assignments and additional rules unique to IDNA2008.  The      
  259       Unicode-based rules are expected to be stable across Unicode         
  260       updates and hence independent of Unicode versions.  That             
  261       specification obsoletes RFC 3941 and IDN use of the tables to        
  262       which it refers.  It is referred to informally in other documents    
  263       in the set as "Tables".                                              
  264                                                                            
  265                                                                            
  266                                                                            
  267                                                                            
  268                                                                            
  269                                                                            
  270                                                                            
  271                                                                            
  272 Klensin                      Standards Track                    [Page 5]   

  273 RFC 5890                    IDNA Definitions                 August 2010   
  274                                                                            
  275                                                                            
  276    o  A document [IDNA2008-Mapping] that discusses the issue of mapping    
  277       characters into other characters and that provides guidance for      
  278       doing so when that is appropriate.  That document, referred to       
  279       informally as "Mapping", provides advice; it is not a required       
  280       part of IDNA.                                                        
  281                                                                            
  282 2.  Definitions and Terminology                                            
  283                                                                            
  284 2.1.  Characters and Character Sets                                        
  285                                                                            
  286    A code point is an integer value in the codespace of a coded            
  287    character set.  In Unicode, these are integers from 0 to 0x10FFFF.      
  288                                                                            
  289    Unicode [Unicode52] is a coded character set containing somewhat over   
  290    100,000 characters assigned to code points as of version 5.2.  A        
  291    single Unicode code point is denoted in these documents by "U+"         
  292    followed by four to six hexadecimal digits, while a range of Unicode    
  293    code points is denoted by two four to six digit hexadecimal numbers     
  294    separated by "..", with no prefixes.                                    
  295                                                                            
  296    ASCII means US-ASCII [ASCII], a coded character set containing 128      
  297    characters associated with code points in the range 0000..007F.         
  298    Unicode is a superset of ASCII and may be thought of as a               
  299    generalization of it; it includes all the ASCII characters and          
  300    associates them with the equivalent code points.                        
  301                                                                            
  302    "Letters" are, informally, generalizations from the ASCII and           
  303    common-sense understanding of that term, i.e., characters that are      
  304    used to write text and that are not digits, symbols, or punctuation.    
  305    Formally, they are characters with a Unicode General Category value     
  306    starting in "L" (see Section 4.5 of The Unicode Standard                
  307    [Unicode52]).                                                           
  308                                                                            
  309 2.2.  DNS-Related Terminology                                              
  310                                                                            
  311    When discussing the DNS, this document generally assumes the            
  312    terminology used in the DNS specifications [RFC1034] [RFC1035] as       
  313    subsequently modified [RFC1123] [RFC2181].  The term "lookup" is used   
  314    to describe the combination of operations performed by the IDNA2008     
  315    protocol and those actually performed by a DNS resolver.  The process   
  316    of placing an entry into the DNS is referred to as "registration".      
  317    This is similar to common contemporary usage of that term in other      
  318    contexts.  Consequently, any DNS zone administration is described as    
  319    a "registry", and the terms "registry" and "zone administrator" are     
  320    used interchangeably, regardless of the actual administrative           
  321    arrangements or level in the DNS tree.  More details about that         
  322    relationship are included in the Rationale document.                    
  323                                                                            
  324                                                                            
  325                                                                            
  326                                                                            
  327 Klensin                      Standards Track                    [Page 6]   

  328 RFC 5890                    IDNA Definitions                 August 2010   
  329                                                                            
  330                                                                            
  331    The term "LDH code point" is defined in this document to refer to the   
  332    code points associated with ASCII letters (Unicode code points          
  333    0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus   
  334    (U+002D).  "LDH" is an abbreviation for "letters, digits, hyphen" but   
  335    is used specifically in this document to refer to the set of naming     
  336    rules described in Section 2.3.1 below.                                 
  337                                                                            
  338    The base DNS specifications [RFC1034] [RFC1035] discuss "domain         
  339    names" and "hostnames", but many people use the terms                   
  340    interchangeably, as do sections of these specifications.  Lack of       
  341    clarity about that terminology has contributed to confusion about       
  342    intent in some cases.  These documents generally use the term "domain   
  343    name".  When they refer to, e.g., hostname syntax restrictions, they    
  344    explicitly cite the relevant defining documents.  The remaining         
  345    definitions in this subsection are essentially a review: if there is    
  346    any perceived difference between those definitions and the              
  347    definitions in the base DNS documents or those cited below, the         
  348    definitions in the other documents take precedence.                     
  349                                                                            
  350    A label is an individual component of a domain name.  Labels are        
  351    usually shown separated by dots; for example, the domain name           
  352    "www.example.com" is composed of three labels: "www", "example", and    
  353    "com".  (The complete name convention using a trailing dot described    
  354    in RFC 1123 [RFC1123], which can be explicit as in "www.example.com."   
  355    or implicit as in "www.example.com", is not considered in this          
  356    specification.)  IDNA extends the set of usable characters in labels    
  357    that are treated as text (as distinct from the binary string labels     
  358    discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones         
  359    [RFC2673]), but only in certain contexts.  The different contexts for   
  360    different sets of usable characters are outlined in the next section.   
  361    For the rest of this document and in the related ones, the term         
  362    "label" is shorthand for "text label", and "every label" means "every   
  363    text label", including the expanded context.                            
  364                                                                            
  365 2.3.  Terminology Specific to IDNA                                         
  366                                                                            
  367    This section defines some terminology to reduce dependence on terms     
  368    and definitions that have been problematic in the past.  The            
  369    relationships among these definitions are illustrated in Figure 1 and   
  370    Figure 2.  In the first of those figures, the parenthesized numbers     
  371    refer to the notes below the figure.                                    
  372                                                                            
  373 2.3.1.  LDH Label                                                          
  374                                                                            
  375    This is the classical label form used, albeit with some additional      
  376    restrictions, in hostnames [RFC0952].  Its syntax is identical to       
  377    that described as the "preferred name syntax" in Section 3.5 of RFC     
  378    1034 [RFC1034] as modified by RFC 1123 [RFC1123].  Briefly, it is a     
  379                                                                            
  380                                                                            
  381                                                                            
  382 Klensin                      Standards Track                    [Page 7]   

  383 RFC 5890                    IDNA Definitions                 August 2010   
  384                                                                            
  385                                                                            
  386    string consisting of ASCII letters, digits, and the hyphen with the     
  387    further restriction that the hyphen cannot appear at the beginning or   
  388    end of the string.  Like all DNS labels, its total length must not      
  389    exceed 63 octets.                                                       
  390                                                                            
  391    LDH labels include the specialized labels used by IDNA (described as    
  392    "A-labels" below) and some additional restricted forms (also            
  393    described below).                                                       
  394                                                                            
  395    To facilitate clear description, two new subsets of LDH labels are      
  396    created by the introduction of IDNA.  These are called Reserved LDH     
  397    labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).      
  398    Reserved LDH labels, known as "tagged domain names" in some other       
  399    contexts, have the property that they contain "--" in the third and     
  400    fourth characters but which otherwise conform to LDH label rules.       
  401    Only a subset of the R-LDH labels can be used in IDNA-aware             
  402    applications.  That subset consists of the class of labels that begin   
  403    with the prefix "xn--" (case independent), but otherwise conform to     
  404    the rules for LDH labels.  That subset is called "XN-labels" in this    
  405    set of documents.  XN-labels are further divided into those whose       
  406    remaining characters (after the "xn--") are valid output of the         
  407    Punycode algorithm [RFC3492] and those that are not (see below).  The   
  408    XN-labels that are valid Punycode output are known as "A-labels" if     
  409    they also meet the other criteria for IDNA-validity described below.    
  410    Because LDH labels (and, indeed, any DNS label) must not be more than   
  411    63 octets in length, the portion of an XN-label derived from the        
  412    Punycode algorithm is limited to no more than 59 ASCII characters.      
  413    Non-Reserved LDH labels are the set of valid LDH labels that do not     
  414    have "--" in the third and fourth positions.                            
  415                                                                            
  416    A consequence of the restrictions on valid characters in the native     
  417    Unicode character form (see U-labels) turns out to be that mixed-case   
  418    annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492],   
  419    is never useful.  Therefore, since a valid A-label is the result of     
  420    Punycode encoding of a U-label, A-labels should be produced only in     
  421    lowercase, despite matching other (mixed-case or uppercase) potential   
  422    labels in the DNS.                                                      
  423                                                                            
  424    Some strings that are prefixed with "xn--" to form labels may not be    
  425    the output of the Punycode algorithm, may fail the other tests          
  426    outlined below, or may violate other IDNA restrictions and thus are     
  427    also not valid IDNA labels.  They are called "Fake A-labels" for        
  428    convenience.                                                            
  429                                                                            
  430    Labels within the class of R-LDH labels that are not prefixed with      
  431    "xn--" are also not valid IDNA labels.  To allow for future use of      
  432    mechanisms similar to IDNA, those labels MUST NOT be processed as       
  433                                                                            
  434                                                                            
  435                                                                            
  436                                                                            
  437 Klensin                      Standards Track                    [Page 8]   

  438 RFC 5890                    IDNA Definitions                 August 2010   
  439                                                                            
  440                                                                            
  441    ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be       
  442    mixed with IDNA labels in the same zone.                                
  443                                                                            
  444    These distinctions among possible LDH labels are only of significance   
  445    for software that is IDNA-aware or for future extensions that use       
  446    extensions based on the same "prefix and encoding" model.  For          
  447    IDNA-aware systems, the valid label types are: A-labels, U-labels,      
  448    and NR-LDH labels.                                                      
  449                                                                            
  450    IDNA labels come in two flavors: an ACE-encoded form and a Unicode      
  451    (native character) form.  These are referred to as A-labels and         
  452    U-labels, respectively, and are described in detail in the next         
  453    section.                                                                
  454                                                                            
  455                                                                            
  456                                                                            
  457                                                                            
  458                                                                            
  459                                                                            
  460                                                                            
  461                                                                            
  462                                                                            
  463                                                                            
  464                                                                            
  465                                                                            
  466                                                                            
  467                                                                            
  468                                                                            
  469                                                                            
  470                                                                            
  471                                                                            
  472                                                                            
  473                                                                            
  474                                                                            
  475                                                                            
  476                                                                            
  477                                                                            
  478                                                                            
  479                                                                            
  480                                                                            
  481                                                                            
  482                                                                            
  483                                                                            
  484                                                                            
  485                                                                            
  486                                                                            
  487                                                                            
  488                                                                            
  489                                                                            
  490                                                                            
  491                                                                            
  492 Klensin                      Standards Track                    [Page 9]   

  493 RFC 5890                    IDNA Definitions                 August 2010   
  494                                                                            
  495                                                                            
  496                                     ASCII Label                            
  497       __________________________________________________________________   
  498       |                                                                |   
  499       |     ____________________ LDH Label (1) (4) ________________    |   
  500       |    |  ___________________________________                  |   |   
  501       |    |  |IDN Reserved LDH Labels          |                  |   |   
  502       |    |  | ("??--") or R-LDH Labels        | _______________  |   |   
  503       |    |  | _______________________________ | |NON-RESERVED |  |   |   
  504       |    |  | |       XN-labels             | | | LDH Labels  |  |   |   
  505       |    |  | | _____________   ___________ | | | (NR-LDH     |  |   |   
  506       |    |  | | | A-labels  |   | Fake (3) || | |   labels)   |  |   |   
  507       |    |  | | | "xn--"(2) |   | A-labels || | |_____________|  |   |   
  508       |    |  | | |___________|   |__________|| |                  |   |   
  509       |    |  | |_____________________________| |                  |   |   
  510       |    |  |_________________________________|                  |   |   
  511       |    |_______________________________________________________|   |   
  512       |                                                                |   
  513       |       _____________NON-LDH label________                       |   
  514       |       |      ______________________    |                       |   
  515       |       |      | Underscore labels  |    |                       |   
  516       |       |      |  e.g., _tcp        |    |                       |   
  517       |       |      |____________________|    |                       |   
  518       |       |      | Labels with leading|    |                       |   
  519       |       |      | or trailing        |    |                       |   
  520       |       |      | hyphens "-abcd"    |    |                       |   
  521       |       |      | or "xyz-"          |    |                       |   
  522       |       |      | or "-uvw-"         |    |                       |   
  523       |       |      |____________________|    |                       |   
  524       |       |      | Labels with other  |    |                       |   
  525       |       |      | non-LDH ASCII chars|    |                       |   
  526       |       |      | e.g., #$%_         |    |                       |   
  527       |       |      |____________________|    |                       |   
  528       |       |________________________________|                       |   
  529       |________________________________________________________________|   
  530                                                                            
  531              (1) ASCII letters (uppercase and lowercase), digits,          
  532                     hyphen.  Hyphen may not appear in first or last        
  533                     position.  No more than 63 octets.                     
  534              (2) Note that the string following "xn--" must                
  535                     be the valid output of the Punycode algorithm          
  536                     and must be convertible into valid U-label form.       
  537              (3) Note that a Fake A-label has a prefix "xn--"              
  538                     but the remainder of the label is NOT the valid        
  539                     output of the Punycode algorithm.                      
  540              (4) LDH label subtypes are indistinguishable to               
  541                     applications that are not IDNA-aware.                  
  542                                                                            
  543     Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels       
  544                                                                            
  545                                                                            
  546                                                                            
  547 Klensin                      Standards Track                   [Page 10]   

  548 RFC 5890                    IDNA Definitions                 August 2010   
  549                                                                            
  550                                                                            
  551                         __________________________                         
  552                         |  Non-ASCII             |                         
  553                         |                        |                         
  554                         |    ___________________ |                         
  555                         |    | U-label (5)     | |                         
  556                         |    |_________________| |                         
  557                         |    |                 | |                         
  558                         |    |  Binary Label   | |                         
  559                         |    | (including      | |                         
  560                         |    |  high bit on)   | |                         
  561                         |    |_________________| |                         
  562                         |    |                 | |                         
  563                         |    | Bit String      | |                         
  564                         |    |   Label         | |                         
  565                         |    |_________________| |                         
  566                         |________________________|                         
  567                                                                            
  568              (5) To applications that are not IDNA-aware, U-labels         
  569                     are indistinguishable from Binary ones.                
  570                                                                            
  571                         Figure 2: Non-ASCII Labels                         
  572                                                                            
  573 2.3.2.  Terms for IDN Label Codings                                        
  574                                                                            
  575 2.3.2.1.  IDNA-valid strings, A-label, and U-label                         
  576                                                                            

The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.

This RFC is included in the DNS RFCs annotation project whose home page is here.

GLOBAL John Klensin(Editorial Erratum #7291) [Verified]
based on outdated version
Request for Comments: 5890
Obsoletes: 3490
Category: Standards Track

It should say:
Request for Comments: 5890
Obsoletes: 3490
Updates: 4343
Category: Standards Track


I have no idea whether this correction is Editorial or Technical , nor
what to use as a Section indication.  However...

RFC 5890 (or IDNA2008 more generally), should have updated RFC 4343 and
the IDN discussion in its Section 5.  The latter references the IDNA2003
documents and makes some statements that are, at best, confusing in the
context of IDNA2008.

See the extended notes for RFC 4343 in https://www.rfc-
editor.org/errata/eid7290 for more discussion and details.

Recommendation: Hold for document update unless this appears to anyone
to be a serious problem, in which case a separate RFC, using the notes
on Errata ID 7290 as a starting point, may be in order.

[AD Note:] Marking this as Verified, and will direct the RFC Editor to
update the metadata about both documents.
  577    For IDNA-aware applications, the three types of valid labels are        
  578    "A-labels", "U-labels", and "NR-LDH labels", each of which is defined   
  579    below.  The relationships among them are illustrated in Figure 1 and    
  580    Figure 2.                                                               
  581                                                                            
  582    o  A string is "IDNA-valid" if it meets all of the requirements of      
  583       these specifications for an IDNA label.  IDNA-valid strings may      
  584       appear in either of the two forms defined immediately below, or      
  585       may be drawn from the NR-LDH label subset.  IDNA-valid strings       
  586       must also conform to all basic DNS requirements for labels.  These   
  587       documents make specific reference to the form appropriate to any     
  588       context in which the distinction is important.                       
  589                                                                            
  590    o  An "A-label" is the ASCII-Compatible Encoding (ACE, see              
  591       Section 2.3.2.5) form of an IDNA-valid string.  It must be a         
  592       complete label: IDNA is defined for labels, not for parts of them    
  593       and not for complete domain names.  This means, by definition,       
  594       that every A-label will begin with the IDNA ACE prefix, "xn--"       
  595       (see Section 2.3.2.5), followed by a string that is a valid output   
  596       of the Punycode algorithm [RFC3492] and hence a maximum of 59        
  597       ASCII characters in length.  The prefix and string together must     
  598       conform to all requirements for a label that can be stored in the    
  599                                                                            
  600                                                                            
  601                                                                            
  602 Klensin                      Standards Track                   [Page 11]   

  603 RFC 5890                    IDNA Definitions                 August 2010   
  604                                                                            
  605                                                                            
  606       DNS including conformance to the rules for LDH labels                
  607       (Section 2.3.1).  If and only if a string meeting the above          
  608       requirements can be decoded into a U-label is it an A-label.         
  609                                                                            
  610    o  A "U-label" is an IDNA-valid string of Unicode characters, in        
  611       Normalization Form C (NFC) and including at least one non-ASCII      
  612       character, expressed in a standard Unicode Encoding Form (such as    
  613       UTF-8).  It is also subject to the constraints about permitted       
  614       characters that are specified in Section 4.2 of the Protocol         
  615       document and the rules in the Sections 2 and 3 of the Tables         
  616       document, the Bidi constraints in that document if it contains any   
  617       character from scripts that are written right to left, and the       
  618       symmetry constraint described immediately below.  Conversions        
  619       between U-labels and A-labels are performed according to the         
  620       "Punycode" specification [RFC3492], adding or removing the ACE       
  621       prefix as needed.                                                    
  622                                                                            
  623    To be valid, U-labels and A-labels must obey an important symmetry      
  624    constraint.  While that constraint may be tested in any of several      
  625    ways, an A-label A1 must be capable of being produced by conversion     
  626    from a U-label U1, and that U-label U1 must be capable of being         
  627    produced by conversion from A-label A1.  Among other things, this       
  628    implies that both U-labels and A-labels must be strings in Unicode      
  629    NFC [Unicode-UAX15] normalized form.  These strings MUST contain only   
  630    characters specified elsewhere in this document series, and only in     
  631    the contexts indicated as appropriate.                                  
  632                                                                            
  633    Any rules or conventions that apply to DNS labels in general apply to   
  634    whichever of the U-label or A-label would be more restrictive.  There   
  635    are two exceptions to this principle.  First, the restriction to        
line-577 Peter Occil(Editorial Erratum #5484) [Reported]
based on outdated version
   For IDNA-aware applications, the three types of valid labels are
   "A-labels", "U-labels", and "NR-LDH labels", each of which is defined
   below.
It should say:
   For IDNA-aware applications, the three types of valid labels are
   "A-labels", "U-labels", and "NR-LDH labels", each of which is defined
   below and in section 2.3.1.

The term NR-LDH label is actually defined in section 2.3.1, not later in this section.
  636    ASCII characters does not apply to the U-label.  Second, expansion of   
  637    the A-label form to a U-label may produce strings that are much         
  638    longer than the normal 63 octet DNS limit (potentially up to 252        
  639    characters) due to the compression efficiency of the Punycode           
  640    algorithm.  Such extended-length U-labels are valid from the            
  641    standpoint of IDNA, but caution should be exercised as shorter limits   
  642    may be imposed by some applications.                                    
  643                                                                            
  644    For context, applications that are not IDNA-aware treat all LDH         
  645    labels as valid for appearance in DNS zone files and queries and some   
  646    of them may permit additional types of labels (i.e., not impose the     
  647    LDH restriction).  IDNA-aware applications permit only A-labels and     
  648    NR-LDH labels to appear in zone files and queries.  U-labels can        
  649    appear, along with the other two, in presentation and user interface    
  650    forms, and in protocols that use IDNA forms but that do not involve     
  651    the DNS itself.                                                         
  652                                                                            
  653                                                                            
  654                                                                            
  655                                                                            
  656                                                                            
  657 Klensin                      Standards Track                   [Page 12]   

  658 RFC 5890                    IDNA Definitions                 August 2010   
  659                                                                            
  660                                                                            
  661    Specifically, for IDNA-aware applications and contexts, the three       
  662    allowed categories are A-label, U-label, and NR-LDH label.  Of the      
  663    Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA     
  664    use.                                                                    
  665                                                                            
  666    Strings that appear to be A-labels or U-labels are processed in         
  667    various operations of the Protocol document [RFC5891].  Those strings   
  668    are not yet demonstrably conformant with the conditions outlined        
  669    above because they are in the process of validation.  Such strings      
  670    may be referred to as "unvalidated", "putative", or "apparent", or as   
  671    being "in the form of" one of the label types to indicate that they     
  672    have not been verified to meet the specified conformance                
  673    requirements.                                                           
  674                                                                            
  675    Unvalidated A-labels are known only to be XN-labels, while Fake         
  676    A-labels have been demonstrated to fail some of the A-label tests.      
  677    Similarly, unvalidated U-labels are simply non-ASCII labels that may    
  678    or may not meet the requirements for U-labels.                          
  679                                                                            
  680 2.3.2.2.  NR-LDH Label                                                     
  681                                                                            
  682    These specifications use the term "NR-LDH label" strictly to refer to   
  683    an all-ASCII label that obeys the LDH label syntax discussed in         
  684    Section 2.3.1 and that is neither an IDN nor a label form reserved by   
  685    IDNA (R-LDH label).  It should be stressed that all A-labels obey the   
  686    "hostname" [RFC0952] rules other than the length restriction in those   
  687    rules.                                                                  
  688                                                                            
  689 2.3.2.3.  Internationalized Domain Name and Internationalized Label        
  690                                                                            
  691    An "internationalized domain name" (IDN) is a domain name that          
  692    contains at least one A-label or U-label, but that otherwise may        
  693    contain any mixture of NR-LDH labels, A-labels, or U-labels.  Just as   
  694    has been the case with ASCII names, some DNS zone administrators may    
  695    impose restrictions, beyond those imposed by DNS or IDNA, on the        
  696    characters or strings that may be registered as labels in their         
  697    zones.  Because of the diversity of characters that can be used in a    
  698    U-label and the confusion they might cause, such restrictions are       
  699    mandatory for IDN registries and zones even though the particular       
  700    restrictions are not part of these specifications (the issue is         
  701    discussed in more detail in Section 4.3 of the Protocol document        
  702    [RFC5891].  Because these restrictions, commonly known as "registry     
  703    restrictions", only affect what can be registered and not lookup        
  704    processing, they have no effect on the syntax or semantics of DNS       
  705    protocol messages; a query for a name that matches no records will      
  706    yield the same response regardless of the reason why it is not in the   
  707    zone.  Clients issuing queries or interpreting responses cannot be      
  708                                                                            
  709                                                                            
  710                                                                            
  711                                                                            
  712 Klensin                      Standards Track                   [Page 13]   

  713 RFC 5890                    IDNA Definitions                 August 2010   
  714                                                                            
  715                                                                            
  716    assumed to have any knowledge of zone-specific restrictions or          
  717    conventions.  See the section on registration policy in the Rationale   
  718    document [RFC5894] for additional discussion.                           
  719                                                                            
  720    "Internationalized label" is used when a term is needed to refer to a   
  721    single label of an IDN, i.e., one that might be any of an NR-LDH        
  722    label, A-label, or U-label.  There are some standardized DNS label      
  723    formats, such as the "underscore labels" used for service location      
  724    (SRV) records [RFC2782], that do not fall into any of the three         
  725    categories and hence are not internationalized labels.                  
  726                                                                            
  727 2.3.2.4.  Label Equivalence                                                
  728                                                                            
  729    In IDNA, equivalence of labels is defined in terms of the A-labels.     
  730    If the A-labels are equal in a case-independent comparison, then the    
  731    labels are considered equivalent, no matter how they are represented.   
  732    Because of the isomorphism of A-labels and U-labels in IDNA2008, it     
  733    is possible to compare U-labels directly; see the Protocol document     
  734    [RFC5891] for details.  Traditional LDH labels already have a notion    
  735    of equivalence: within that list of characters, uppercase and           
  736    lowercase are considered equivalent.  The IDNA notion of equivalence    
  737    is an extension of that older notion but, because the protocol does     
  738    not specify any mandatory mapping and only those isomorphic forms are   
  739    considered, the only equivalents are:                                   
  740                                                                            
  741    o  Exact (bit-string identity) matches between a pair of U-labels.      
  742                                                                            
  743    o  Matches between a pair of A-labels, using normal DNS                 
  744       case-insensitive matching rules.                                     
  745                                                                            
  746    o  Equivalence between a U-label and an A-label determined by           
  747       translating the U-label form into an A-label form and then testing   
  748       for a match between the A-labels using normal DNS case-insensitive   
  749       matching rules.                                                      
  750                                                                            
  751 2.3.2.5.  ACE Prefix                                                       
  752                                                                            
  753    The "ACE prefix" is defined in this document to be a string of ASCII    
  754    characters, "xn--", that appears at the beginning of every A-label.     
  755    "ACE" stands for "ASCII-Compatible Encoding".                           
  756                                                                            
  757 2.3.2.6.  Domain Name Slot                                                 
  758                                                                            
  759    A "domain name slot" is defined in this document to be a protocol       
  760    element or a function argument or a return value (and so on)            
  761    explicitly designated for carrying a domain name.  Examples of domain   
  762    name slots include the QNAME field of a DNS query; the name argument    
  763    of the gethostbyname() or getaddrinfo() standard C library functions;   
  764                                                                            
  765                                                                            
  766                                                                            
  767 Klensin                      Standards Track                   [Page 14]   

  768 RFC 5890                    IDNA Definitions                 August 2010   
  769                                                                            
  770                                                                            
  771    the part of an email address following the at sign ("@") in the         
  772    parameter to the SMTP MAIL or RCPT commands or the "From:" field of     
  773    an email message header; and the host portion of the URI in the "src"   
  774    attribute of an HTML "<IMG>" tag.  A string that has the syntax of a    
  775    domain name but that appears in general text is not in a domain name    
  776    slot.  For example, a domain name appearing in the plain text body of   
  777    an email message is not occupying a domain name slot.                   
  778                                                                            
  779    An "IDNA-aware domain name slot" is defined for this set of documents   
  780    to be a domain name slot explicitly designated for carrying an          
  781    internationalized domain name as defined in this document.  The         
  782    designation may be static (for example, in the specification of the     
  783    protocol or interface) or dynamic (for example, as a result of          
  784    negotiation in an interactive session).                                 
  785                                                                            
  786    Name slots that are not IDNA-aware obviously include any domain name    
  787    slot whose specification predates IDNA.  Note that the requirements     
  788    of some protocols that use the DNS for data storage prevent the use     
  789    of IDNs.  For example, the format required for the underscore labels    
  790    used by the service location protocol [RFC2782] precludes               
  791    representation of a non-ASCII label in the DNS using A-labels because   
  792    those SRV-related labels must start with underscores.  Of course,       
  793    non-ASCII IDN labels may be part of a domain name that also includes    
  794    underscore labels.                                                      
  795                                                                            
  796 2.3.3.  Order of Characters in Labels                                      
  797                                                                            
  798    Because IDN labels may contain characters that are read, and            
  799    preferentially displayed, from right to left, there is a potential      
  800    ambiguity about which character in a label is "first".  For the         
  801    purposes of these specifications, labels are considered, and            
  802    characters numbered, strictly in the order in which they appear "on     
  803    the wire".  That order is equivalent to the leftmost character being    
  804    treated as first in a label that is read left to right and to the       
  805    rightmost character being first in a label that is read right to        
  806    left.  The Bidi specification contains additional discussion of the     
  807    conditions that influence reading order.                                
  808                                                                            
  809 2.3.4.  Punycode is an Algorithm, Not a Name or Adjective                  
  810                                                                            
  811    There has been some confusion about whether a "Punycode string" does    
  812    or does not include the ACE prefix and about whether it is required     
  813    that such strings could have been the output of the ToASCII operation   
  814    (see RFC 3490, Section 4 [RFC3490]).  This specification discourages    
  815    the use of the term "Punycode" to describe anything but the encoding    
  816    method and algorithm of RFC 3492 [RFC3492].  The terms defined above    
  817    are preferred as much more clear than the term "Punycode string".       
  818                                                                            
  819                                                                            
  820                                                                            
  821                                                                            
  822 Klensin                      Standards Track                   [Page 15]   

  823 RFC 5890                    IDNA Definitions                 August 2010   
  824                                                                            
  825                                                                            
  826 3.  IANA Considerations                                                    
  827                                                                            
  828    IANA actions for this version of IDNA (IDNA2008) are specified in the   
  829    Tables document [RFC5892].  An overview of the relationships among      
  830    the various IANA registries appears in the Rationale document           
  831    [RFC5894].  This document does not specify any actions for IANA.        
  832                                                                            
  833 4.  Security Considerations                                                
  834                                                                            
  835 4.1.  General Issues                                                       
  836                                                                            
  837    Security on the Internet partly relies on the DNS.  Thus, any change    
  838    to the characteristics of the DNS can change the security of much of    
  839    the Internet.                                                           
  840                                                                            
  841    Domain names are used by users to identify and connect to Internet      
  842    hosts and other network resources.  The security of the Internet is     
  843    compromised if a user entering a single internationalized name is       
  844    connected to different servers based on different interpretations of    
  845    the internationalized domain name.  In addition to characters that      
  846    are permitted by IDNA2003 and its mapping conventions (see              
  847    Section 4.6), the current specification changes the interpretation of   
  848    a few characters that were mapped to others in the earlier version;     
  849    zone administrators should be aware of the problems that this might     
  850    raise and take appropriate measures.  The context for this issue is     
  851    discussed in more detail in the Rationale document [RFC5894].           
  852                                                                            
  853    In addition to the Security Considerations material that appears in     
  854    this document, the Bidi document [RFC5893] contains a discussion of     
  855    security issues specific to labels containing characters from scripts   
  856    that are normally written right to left.                                
  857                                                                            
  858 4.2.  U-label Lengths                                                      
  859                                                                            
  860    Labels associated with the DNS have traditionally been limited to 63    
  861    octets by the general restrictions in RFC 1035 and by the need to       
  862    treat them as a six-bit string length followed by the string in         
  863    actual calls to the DNS.  That format is used in some other             
  864    applications and, in general, that representations of domain names as   
  865    dot-separated labels and as length-string pairs have been treated as    
line-636 Juan Altmayer Pizzorno(Editorial Erratum #4695) [Verified]
based on outdated version
expansion of the A-label form to a U-label may produce strings that are
much longer than the normal 63 octet DNS limit (potentially up to 252
characters)
^^^^^^^^^
It should say:
expansion of the A-label form to a U-label may produce strings that are
much longer than the normal 63 octet DNS limit (potentially up to 252
charactersoctets)
^^^^^

The sentence should have used "octets" instead of "characters".
A separate erratum was files for possible tightening of the upper bound in a future revision of this document.
line-636 Juan Altmayer Pizzorno(Technical Erratum #4823) [Held for Document Update]
based on outdated version
expansion of the A-label form to a U-label may produce strings that are
much longer than the normal 63 octet DNS limit (potentially up to 252
characters)
It should say:
expansion of the A-label form to a U-label may produce strings that are
much longer than the normal 63 octet DNS limit (potentially up to 252
characters59 Unicode code points or 236 octets)

The contents of U-labels are encoded in the up to 59 ASCII characters (see 2.3.2.1 itself) output by the Punycode algorithm in their corresponding A-labels. The Punycode decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at least one of those ASCII characters for each code point inserted into the U-label. An U-label, thus, can contain at the most 59 Unicode code points.
Since U-labels are defined (in 2.3.2.1) to be expressed in a standard Unicode Encoding Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can encode a code point in at most 4 octets, 236 octets is an upper bound for an U-label's length.
I think it should be possible to derive a tighter bound, but its rationale would likely be less straighforward.
I imagine the number 252 was originally derived by multiplying 63, the maximum length of an A-label (including the "xn--" prefix), by 4, the maximum number of octets needed to represent a code point.
  866    interchangeable.  Because A-labels (the form actually used in the       
  867    DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,     
  868    in general, more compressed that UTF-16 or UTF-32), U-labels that       
  869    obey all of the relevant symmetry (and other) constraints of these      
  870    documents may be quite a bit longer, potentially up to 252 characters   
  871    (Unicode code points).  A fully-qualified domain name containing        
  872    several such labels can obviously also exceed the nominal 255 octet     
  873                                                                            
  874                                                                            
  875                                                                            
  876                                                                            
  877 Klensin                      Standards Track                   [Page 16]   

  878 RFC 5890                    IDNA Definitions                 August 2010   
  879                                                                            
  880                                                                            
  881    limit for such names.  Application authors using U-labels must exert    
  882    due caution to avoid buffer overflow and truncation errors and          
  883    attacks in contexts where shorter strings are expected.                 
  884                                                                            
  885 4.3.  Local Character Set Issues                                           
  886                                                                            
  887    When systems use local character sets other than ASCII and Unicode,     
  888    these specifications leave the problem of converting between the        
  889    local character set and Unicode up to the application or local          
  890    system.  If different applications (or different versions of one        
  891    application) implement different rules for conversions among coded      
  892    character sets, they could interpret the same name differently and      
  893    contact different servers.  This problem is not solved by security      
  894    protocols, such as Transport Layer Security (TLS) [RFC5246], that do    
  895    not take local character sets into account.                             
  896                                                                            
  897 4.4.  Visually Similar Characters                                          
  898                                                                            
  899    To help prevent confusion between characters that are visually          
  900    similar (sometimes called "confusables"), it is suggested that          
  901    implementations provide visual indications where a domain name          
  902    contains multiple scripts, especially when the scripts contain          
  903    characters that are easily confused visually, such as an omicron in     
  904    Greek mixed with Latin text.  Such mechanisms can also be used to       
  905    show when a name contains a mixture of Simplified Chinese characters    
  906    with Traditional ones that have Simplified forms, or to distinguish     
  907    zero and one from uppercase "O" and lowercase "L".  DNS zone            
  908    administrators may impose restrictions (subject to the limitations      
  909    identified elsewhere in these documents) that try to minimize           
  910    characters that have similar appearance or similar interpretations.     
  911                                                                            
  912    If multiple characters appear in a label and the label consists only    
  913    of characters in one script, individual characters that might be        
  914    confused with others if compared separately may be unambiguous and      
  915    non-confusing.  On the other hand, that observation makes labels        
  916    containing characters from more than one script (often called "mixed-   
  917    script labels") even more risky -- users will tend to see what they     
  918    expect to see and context is a powerful reinforcement to perception.    
  919    At the same time, while the risks associated with mixed-script labels   
  920    are clear, simply prohibiting them will not eliminate problems,         
  921    especially where closely related scripts are involved.  For example,    
  922    there are many strings that are entirely in Greek or Cyrillic scripts   
  923    that can be confused with each other or with Latin script strings.      
  924                                                                            
  925    It is worth noting that there are no comprehensive technical            
  926    solutions to the problems of confusable characters.  One can reduce     
  927    the extent of the problems in various ways, but probably never          
  928                                                                            
  929                                                                            
  930                                                                            
  931                                                                            
  932 Klensin                      Standards Track                   [Page 17]   

  933 RFC 5890                    IDNA Definitions                 August 2010   
  934                                                                            
  935                                                                            
  936    eliminate it.  Some specific suggestions about identification and       
  937    handling of confusable characters appear in a Unicode Consortium        
  938    publication [Unicode-UTR36].                                            
  939                                                                            
  940 4.5.  IDNA Lookup, Registration, and the Base DNS Specifications           
  941                                                                            
  942    The Protocol specification [RFC5891] describes procedures for           
  943    registering and looking up labels that are not compatible with the      
  944    preferred syntax described in the base DNS specifications (see          
  945    Section 2.3.1) because they contain non-ASCII characters.  These        
  946    procedures depend on the use of a special ASCII-compatible encoding     
  947    form that contains only characters permitted in hostnames by those      
  948    earlier specifications.  The encoding used is Punycode [RFC3492].  No   
  949    security issues such as string length increases or new allowed values   
  950    are introduced by the encoding process or the use of these encoded      
  951    values, apart from those introduced by the ACE encoding itself.         
  952                                                                            
  953    Domain names (or portions of them) are sometimes compared against a     
  954    set of domains to be given special treatment if a match occurs, e.g.,   
  955    treated as more privileged than others or blocked in some way.  In      
  956    such situations, it is especially important that the comparisons be     
  957    done properly, as specified in the "Requirements" section of the        
  958    Protocol document [RFC5891].  For labels already in ASCII form, the     
  959    proper comparison reduces to the same case-insensitive ASCII            
  960    comparison that has always been used for ASCII labels although          
  961    IDNA-aware applications are expected to look up only A-labels and       
  962    NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not      
  963    A-labels.                                                               
  964                                                                            
  965    The introduction of IDNA meant that any existing labels that start      
  966    with the ACE prefix would be construed as A-labels, at least until      
  967    they failed one of the relevant tests, whether or not that was the      
  968    intent of the zone administrator or registrant.  There is no evidence   
  969    that this has caused any practical problems since RFC 3490 was          
  970    adopted, but the risk still exists in principle.                        
  971                                                                            
  972 4.6.  Legacy IDN Label Strings                                             
  973                                                                            
  974    The URI Standard [RFC3986] and a number of application specifications   
  975    (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII       
  976    labels in DNS names used with those protocols, i.e., only the A-label   
  977    form of IDNs is permitted in those contexts.  If only A-labels are      
  978    used, differences in interpretation between IDNA2003 and this version   
  979    arise only for characters whose interpretation have actually changed    
  980    (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing    
  981    in IDNA2003 and that are considered legitimate in some contexts by      
  982    these specifications).  Despite that prohibition, there are a           
  983    significant number of files and databases on the Internet in which      
  984                                                                            
  985                                                                            
  986                                                                            
  987 Klensin                      Standards Track                   [Page 18]   

  988 RFC 5890                    IDNA Definitions                 August 2010   
  989                                                                            
  990                                                                            
  991    domain name strings appear in native-character form; a subset of        
  992    those strings use native-character labels that require IDNA2003         
  993    mapping to produce valid A-labels.  The treatment of such labels will   
  994    vary by types of applications and application-designer preference: in   
  995    some situations, warnings to the user or outright rejection may be      
  996    appropriate; in others, it may be preferable to attempt to apply the    
  997    earlier mappings if lookup strictly conformant to these                 
  998    specifications fails or even to do lookups under both sets of rules.    
  999    This general situation is discussed in more detail in the Rationale     
 1000    document [RFC5894].  However, in the absence of care by registries      
 1001    about how strings that could have different interpretations under       
 1002    IDNA2003 and the current specification are handled, it is possible      
 1003    that the differences could be used as a component of name-matching or   
 1004    name-confusion attacks.  Such care is therefore appropriate.            
 1005                                                                            
 1006 4.7.  Security Differences from IDNA2003                                   
 1007                                                                            
 1008    The registration and lookup models described in this set of documents   
 1009    change the mechanisms available for lookup applications to determine    
 1010    the validity of labels they encounter.  In some respects, the ability   
 1011    to test is strengthened.  For example, putative labels that contain     
 1012    unassigned code points will now be rejected, while IDNA2003 permitted   
 1013    them (see the Rationale document [RFC5894] for a discussion of the      
 1014    reasons for this).  On the other hand, the Protocol specification no    
 1015    longer assumes that the application that looks up a name will be able   
 1016    to determine, and apply, information about the protocol version used    
 1017    in registration.  In theory, that may increase risk since the           
 1018    application will be able to do less pre-lookup validation.  In          
 1019    practice, the protection afforded by that test has been largely         
 1020    illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in   
 1021    these documents.                                                        
 1022                                                                            
 1023    Any change to the Stringprep [RFC3454] procedure that is profiled and   
 1024    used in IDNA2003, or, more broadly, the IETF's model of the use of      
 1025    internationalized character strings in different protocols, creates     
 1026    some risk of inadvertent changes to those protocols, invalidating       
 1027    deployed applications or databases, and so on.  But these               
 1028    specifications do not change Stringprep at all; they merely bypass      
 1029    it.  Because these documents do not depend on Stringprep, the           
 1030    question of upgrading other protocols that do have that dependency      
 1031    can be left to experts on those protocols: the IDNA changes and         
 1032    possible upgrades to security protocols or conventions are              
 1033    independent issues.                                                     
 1034                                                                            
 1035                                                                            
 1036                                                                            
 1037                                                                            
 1038                                                                            
 1039                                                                            
 1040                                                                            
 1041                                                                            
 1042 Klensin                      Standards Track                   [Page 19]   

 1043 RFC 5890                    IDNA Definitions                 August 2010   
 1044                                                                            
 1045                                                                            
 1046 4.8.  Summary                                                              
 1047                                                                            
 1048    No mechanism involving names or identifiers alone can protect against   
 1049    a wide variety of security threats and attacks that are largely         
 1050    independent of the naming or identification system.  These attacks      
 1051    include spoofed pages, DNS query trapping and diversion, and so on.     
 1052                                                                            
 1053 5.  Acknowledgments                                                        
 1054                                                                            
 1055    The initial version of this document was created largely by             
 1056    extracting text from early draft versions of the Rationale document     
 1057    [RFC5894].  See the section of this name and the one entitled           
 1058    "Contributors", in it.                                                  
 1059                                                                            
 1060    Specific textual suggestions after the extraction process came from     
 1061    Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken     
 1062    Whistler.  Other changes were made in response to more general          
 1063    comments, lists of concerns or specific errors from participants in     
 1064    the Working Group and other observers, including Lyman Chapin, James    
 1065    Mitchell, Subramanian Moonesamy, and Dan Winship.                       
 1066                                                                            
 1067 6.  References                                                             
 1068                                                                            
 1069 6.1.  Normative References                                                 
 1070                                                                            
 1071    [ASCII]      American National Standards Institute (formerly United     
 1072                 States of America Standards Institute), "USA Code for      
 1073                 Information Interchange", ANSI X3.4-1968, 1968.  ANSI      
 1074                 X3.4-1968 has been replaced by newer versions with         
 1075                 slight modifications, but the 1968 version remains         
 1076                 definitive for the Internet.                               
 1077                                                                            
 1078    [RFC1034]    Mockapetris, P., "Domain names - concepts and              
 1079                 facilities", STD 13, RFC 1034, November 1987.              
 1080                                                                            
 1081    [RFC1035]    Mockapetris, P., "Domain names - implementation and        
 1082                 specification", STD 13, RFC 1035, November 1987.           
 1083                                                                            
 1084    [RFC1123]    Braden, R., "Requirements for Internet Hosts -             
 1085                 Application and Support", STD 3, RFC 1123, October 1989.   
 1086                                                                            
 1087    [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate        
 1088                 Requirement Levels", BCP 14, RFC 2119, March 1997.         
 1089                                                                            
 1090                                                                            
 1091                                                                            
 1092                                                                            
 1093                                                                            
 1094                                                                            
 1095                                                                            
 1096                                                                            
 1097 Klensin                      Standards Track                   [Page 20]   

 1098 RFC 5890                    IDNA Definitions                 August 2010   
 1099                                                                            
 1100                                                                            
 1101    [Unicode-UAX15]                                                         
 1102                 The Unicode Consortium, "Unicode Standard Annex #15:       
 1103                 Unicode Normalization Forms, Revision 31",                 
 1104                 September 2009,                                            
 1105                 <http://www.unicode.org/reports/tr15/tr15-31.html>.        
 1106                                                                            
 1107    [Unicode52]  The Unicode Consortium.  The Unicode Standard, Version     
 1108                 5.2.0, defined by: "The Unicode Standard, Version          
 1109                 5.2.0", (Mountain View, CA: The Unicode Consortium,        
 1110                 2009. ISBN 978-1-936213-00-9).                             
 1111                 <http://www.unicode.org/versions/Unicode5.2.0/>.           
 1112                                                                            
 1113 6.2.  Informative References                                               
 1114                                                                            
 1115    [IDNA2008-Mapping]                                                      
 1116                 Resnick, P. and P. Hoffman, "Mapping Characters in         
 1117                 Internationalized Domain Names for Applications (IDNA)",   
 1118                 Work in Progress, April 2010.                              
 1119                                                                            
 1120    [RFC0952]    Harrenstien, K., Stahl, M., and E. Feinler, "DoD           
 1121                 Internet host table specification", RFC 952,               
 1122                 October 1985.                                              
 1123                                                                            
 1124    [RFC2181]    Elz, R. and R. Bush, "Clarifications to the DNS            
 1125                 Specification", RFC 2181, July 1997.                       
 1126                                                                            
 1127    [RFC2616]    Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,          
 1128                 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext    
 1129                 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.       
 1130                                                                            
 1131    [RFC2673]    Crawford, M., "Binary Labels in the Domain Name System",   
 1132                 RFC 2673, August 1999.                                     
 1133                                                                            
 1134    [RFC2782]    Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for   
 1135                 specifying the location of services (DNS SRV)",            
 1136                 RFC 2782, February 2000.                                   
 1137                                                                            
 1138    [RFC3454]    Hoffman, P. and M. Blanchet, "Preparation of               
 1139                 Internationalized Strings ("stringprep")", RFC 3454,       
 1140                 December 2002.                                             
 1141                                                                            
 1142    [RFC3490]    Faltstrom, P., Hoffman, P., and A. Costello,               
 1143                 "Internationalizing Domain Names in Applications           
 1144                 (IDNA)", RFC 3490, March 2003.                             
 1145                                                                            
 1146    [RFC3491]    Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep       
 1147                 Profile for Internationalized Domain Names (IDN)",         
 1148                 RFC 3491, March 2003.                                      
 1149                                                                            
 1150                                                                            
 1151                                                                            
 1152 Klensin                      Standards Track                   [Page 21]   

 1153 RFC 5890                    IDNA Definitions                 August 2010   
 1154                                                                            
 1155                                                                            
 1156    [RFC3492]    Costello, A., "Punycode: A Bootstring encoding of          
 1157                 Unicode for Internationalized Domain Names in              
 1158                 Applications (IDNA)", RFC 3492, March 2003.                
 1159                                                                            
 1160    [RFC3986]    Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform   
 1161                 Resource Identifier (URI): Generic Syntax", STD 66,        
 1162                 RFC 3986, January 2005.                                    
 1163                                                                            
 1164    [RFC4690]    Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review     
 1165                 and Recommendations for Internationalized Domain Names     
 1166                 (IDNs)", RFC 4690, September 2006.                         
 1167                                                                            
 1168    [RFC5246]    Dierks, T. and E. Rescorla, "The Transport Layer           
 1169                 Security (TLS) Protocol Version 1.2", RFC 5246,            
 1170                 August 2008.                                               
 1171                                                                            
 1172    [RFC5321]    Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,    
 1173                 October 2008.                                              
 1174                                                                            
 1175    [RFC5891]    Klensin, J., "Internationalized Domain Names in            
 1176                 Applications (IDNA): Protocol", RFC 5891, August 2010.     
 1177                                                                            
 1178    [RFC5892]    Faltstrom, P., Ed., "The Unicode Code Points and           
 1179                 Internationalized Domain Names for Applications (IDNA)",   
 1180                 RFC 5892, August 2010.                                     
 1181                                                                            
 1182    [RFC5893]    Alvestrand, H. and C. Karp, "Right-to-Left Scripts for     
 1183                 Internationalized Domain Names for Applications (IDNA)",   
 1184                 RFC 5893, August 2010.                                     
 1185                                                                            
 1186    [RFC5894]    Klensin, J., "Internationalized Domain Names for           
 1187                 Applications (IDNA): Background, Explanation, and          
 1188                 Rationale", RFC 5894, August 2010.                         
 1189                                                                            
 1190    [Unicode-UTR36]                                                         
 1191                 The Unicode Consortium, "Unicode Technical Report #36:     
 1192                 Unicode Security Considerations, Revision 7", July 2008,   
 1193                 <http://www.unicode.org/reports/tr36/tr36-7.html>.         
 1194                                                                            
 1195                                                                            
 1196                                                                            
 1197                                                                            
 1198                                                                            
 1199                                                                            
 1200                                                                            
 1201                                                                            
 1202                                                                            
 1203                                                                            
 1204                                                                            
 1205                                                                            
 1206                                                                            
 1207 Klensin                      Standards Track                   [Page 22]   

 1208 RFC 5890                    IDNA Definitions                 August 2010   
 1209                                                                            
 1210                                                                            
 1211 Author's Address                                                           
 1212                                                                            
 1213    John C Klensin                                                          
 1214    1770 Massachusetts Ave, Ste 322                                         
 1215    Cambridge, MA  02140                                                    
 1216    USA                                                                     
 1217                                                                            
 1218    Phone: +1 617 245 1457                                                  
 1219    EMail: john+ietf@jck.com                                                
 1220                                                                            
 1221                                                                            
 1222                                                                            
 1223                                                                            
 1224                                                                            
 1225                                                                            
 1226                                                                            
 1227                                                                            
 1228                                                                            
 1229                                                                            
 1230                                                                            
 1231                                                                            
 1232                                                                            
 1233                                                                            
 1234                                                                            
 1235                                                                            
 1236                                                                            
 1237                                                                            
 1238                                                                            
 1239                                                                            
 1240                                                                            
 1241                                                                            
 1242                                                                            
 1243                                                                            
 1244                                                                            
 1245                                                                            
 1246                                                                            
 1247                                                                            
 1248                                                                            
 1249                                                                            
 1250                                                                            
 1251                                                                            
 1252                                                                            
 1253                                                                            
 1254                                                                            
 1255                                                                            
 1256                                                                            
 1257                                                                            
 1258                                                                            
 1259                                                                            
 1260                                                                            
 1261                                                                            
 1262 Klensin                      Standards Track                   [Page 23]   
 1263                                                                            
line-866 Juan Altmayer Pizzorno(Editorial Erratum #4696) [Verified]
based on outdated version
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents  may be quite a bit longer, potentially up to 252 characters
(Unicode code points).
It should say:
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents  may be quite a bit longer, potentially up to 252 characters
(Unicode code points)octets.

Similar to Erratum 4695.
line-866 Juan Altmayer Pizzorno(Technical Erratum #4824) [Held for Document Update]
based on outdated version
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 252 characters
(Unicode code points).
It should say:
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 252 characters
(Unicode code points)59 Unicode
code points, or up to 236 octets.

(The same rationale as my report for 2.3.2.1 applies:)
The contents of U-labels are encoded in the up to 59 ASCII characters (see 2.3.2.1) output by the Punycode algorithm in their corresponding A-labels. The Punycode decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at least one of those ASCII characters for each code point inserted into the U-label. An U-label, thus, can contain at the most 59 Unicode code points.
Since U-labels are defined (in 2.3.2.1) to be expressed in a standard Unicode Encoding Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can encode a code point in at most 4 octets, 236 octets is an upper bound for an U-label's length.
I think it should be possible to derive a tighter bound, but its rationale would likely be less straighforward.
I imagine the number 252 was originally derived by multiplying 63, the maximum length of an A-label (including the "xn--" prefix), by 4, the maximum number of octets needed to represent a code point.