1 Internet Engineering Task Force (IETF)                        J. Klensin   
    2 Request for Comments: 5891                                   August 2010   
    3 Obsoletes: 3490, 3491                                                      
    4 Updates: 3492                                                              
    5 Category: Standards Track                                                  
    6 ISSN: 2070-1721                                                            
    7                                                                            
    8                                                                            
    9     Internationalized Domain Names in Applications (IDNA): Protocol        
   10                                                                            
   11 Abstract                                                                   
   12                                                                            
   13    This document is the revised protocol definition for                    
   14    Internationalized Domain Names (IDNs).  The rationale for changes,      
   15    the relationship to the older specification, and important              
   16    terminology are provided in other documents.  This document specifies   
   17    the protocol mechanism, called Internationalized Domain Names in        
   18    Applications (IDNA), for registering and looking up IDNs in a way       
   19    that does not require changes to the DNS itself.  IDNA is only meant    
   20    for processing domain names, not free text.                             
   21                                                                            
   22 Status of This Memo                                                        
   23                                                                            
   24    This is an Internet Standards Track document.                           
   25                                                                            
   26    This document is a product of the Internet Engineering Task Force       
   27    (IETF).  It represents the consensus of the IETF community.  It has     
   28    received public review and has been approved for publication by the     
   29    Internet Engineering Steering Group (IESG).  Further information on     
   30    Internet Standards is available in Section 2 of RFC 5741.               
   31                                                                            
   32    Information about the current status of this document, any errata,      
   33    and how to provide feedback on it may be obtained at                    
   34    http://www.rfc-editor.org/info/rfc5891.                                 
   35                                                                            
   36                                                                            
   37                                                                            
   38                                                                            
   39                                                                            
   40                                                                            
   41                                                                            
   42                                                                            
   43                                                                            
   44                                                                            
   45                                                                            
   46                                                                            
   47                                                                            
   48                                                                            
   49                                                                            
   50                                                                            
   51                                                                            
   52 Klensin                      Standards Track                    [Page 1]   

   53 RFC 5891                    IDNA2008 Protocol                August 2010   
   54                                                                            
   55                                                                            
   56 Copyright Notice                                                           
   57                                                                            
   58    Copyright (c) 2010 IETF Trust and the persons identified as the         
   59    document authors.  All rights reserved.                                 
   60                                                                            
   61    This document is subject to BCP 78 and the IETF Trust's Legal           
   62    Provisions Relating to IETF Documents                                   
   63    (http://trustee.ietf.org/license-info) in effect on the date of         
   64    publication of this document.  Please review these documents            
   65    carefully, as they describe your rights and restrictions with respect   
   66    to this document.  Code Components extracted from this document must    
   67    include Simplified BSD License text as described in Section 4.e of      
   68    the Trust Legal Provisions and are provided without warranty as         
   69    described in the Simplified BSD License.                                
   70                                                                            
   71    This document may contain material from IETF Documents or IETF          
   72    Contributions published or made publicly available before November      
   73    10, 2008.  The person(s) controlling the copyright in some of this      
   74    material may not have granted the IETF Trust the right to allow         
   75    modifications of such material outside the IETF Standards Process.      
   76    Without obtaining an adequate license from the person(s) controlling    
   77    the copyright in such materials, this document may not be modified      
   78    outside the IETF Standards Process, and derivative works of it may      
   79    not be created outside the IETF Standards Process, except to format     
   80    it for publication as an RFC or to translate it into languages other    
   81    than English.                                                           
   82                                                                            
   83                                                                            
   84                                                                            
   85                                                                            
   86                                                                            
   87                                                                            
   88                                                                            
   89                                                                            
   90                                                                            
   91                                                                            
   92                                                                            
   93                                                                            
   94                                                                            
   95                                                                            
   96                                                                            
   97                                                                            
   98                                                                            
   99                                                                            
  100                                                                            
  101                                                                            
  102                                                                            
  103                                                                            
  104                                                                            
  105                                                                            
  106                                                                            
  107 Klensin                      Standards Track                    [Page 2]   

  108 RFC 5891                    IDNA2008 Protocol                August 2010   
  109                                                                            
  110                                                                            
  111 Table of Contents                                                          
  112                                                                            
  113    1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4   
  114    2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4   
  115    3.  Requirements and Applicability . . . . . . . . . . . . . . . .  5   
  116      3.1.  Requirements . . . . . . . . . . . . . . . . . . . . . . .  5   
  117      3.2.  Applicability  . . . . . . . . . . . . . . . . . . . . . .  5   
  118        3.2.1.  DNS Resource Records . . . . . . . . . . . . . . . . .  6   
  119        3.2.2.  Non-Domain-Name Data Types Stored in the DNS . . . . .  6   
  120    4.  Registration Protocol  . . . . . . . . . . . . . . . . . . . .  6   
  121      4.1.  Input to IDNA Registration . . . . . . . . . . . . . . . .  7   
  122      4.2.  Permitted Character and Label Validation . . . . . . . . .  7   
  123        4.2.1.  Input Format . . . . . . . . . . . . . . . . . . . . .  7   
  124        4.2.2.  Rejection of Characters That Are Not Permitted . . . .  8   
  125        4.2.3.  Label Validation . . . . . . . . . . . . . . . . . . .  8   
  126        4.2.4.  Registration Validation Requirements . . . . . . . . .  9   
  127      4.3.  Registry Restrictions  . . . . . . . . . . . . . . . . . .  9   
  128      4.4.  Punycode Conversion  . . . . . . . . . . . . . . . . . . .  9   
  129      4.5.  Insertion in the Zone  . . . . . . . . . . . . . . . . . . 10   
  130    5.  Domain Name Lookup Protocol  . . . . . . . . . . . . . . . . . 10   
  131      5.1.  Label String Input . . . . . . . . . . . . . . . . . . . . 10   
  132      5.2.  Conversion to Unicode  . . . . . . . . . . . . . . . . . . 10   
  133      5.3.  A-label Input  . . . . . . . . . . . . . . . . . . . . . . 10   
  134      5.4.  Validation and Character List Testing  . . . . . . . . . . 11   
  135      5.5.  Punycode Conversion  . . . . . . . . . . . . . . . . . . . 13   
  136      5.6.  DNS Name Resolution  . . . . . . . . . . . . . . . . . . . 13   
  137    6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13   
  138    7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13   
  139    8.  Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13   
  140    9.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 14   
  141    10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14   
  142      10.1. Normative References . . . . . . . . . . . . . . . . . . . 14   
  143      10.2. Informative References . . . . . . . . . . . . . . . . . . 15   
  144    Appendix A.  Summary of Major Changes from IDNA2003  . . . . . . . 17   
  145                                                                            
  146                                                                            
  147                                                                            
  148                                                                            
  149                                                                            
  150                                                                            
  151                                                                            
  152                                                                            
  153                                                                            
  154                                                                            
  155                                                                            
  156                                                                            
  157                                                                            
  158                                                                            
  159                                                                            
  160                                                                            
  161                                                                            
  162 Klensin                      Standards Track                    [Page 3]   

  163 RFC 5891                    IDNA2008 Protocol                August 2010   
  164                                                                            
  165                                                                            
  166 1.  Introduction                                                           
  167                                                                            
  168    This document supplies the protocol definition for Internationalized    
  169    Domain Names in Applications (IDNA), with the version specified here    
  170    known as IDNA2008.  Essential definitions and terminology for           
  171    understanding this document and a road map of the collection of         
  172    documents that make up IDNA2008 appear in a separate Definitions        
  173    document [RFC5890].  Appendix A discusses the relationship between      
  174    this specification and the earlier version of IDNA (referred to here    
  175    as "IDNA2003").  The rationale for these changes, along with            
  176    considerable explanatory material and advice to zone administrators     
  177    who support IDNs, is provided in another document, known informally     
  178    in this series as the "Rationale document" [RFC5894].                   
  179                                                                            
  180    IDNA works by allowing applications to use certain ASCII [ASCII]        
  181    string labels (beginning with a special prefix) to represent            
  182    non-ASCII name labels.  Lower-layer protocols need not be aware of      
  183    this; therefore, IDNA does not change any infrastructure.  In           
  184    particular, IDNA does not depend on any changes to DNS servers,         
  185    resolvers, or DNS protocol elements, because the ASCII name service     
  186    provided by the existing DNS can be used for IDNA.                      
  187                                                                            
  188    IDNA applies only to a specific subset of DNS labels.  The base DNS     
  189    standards [RFC1034] [RFC1035] and their various updates specify how     
  190    to combine labels into fully-qualified domain names and parse labels    
  191    out of those names.                                                     
  192                                                                            
  193    This document describes two separate protocols, one for IDN             
  194    registration (Section 4) and one for IDN lookup (Section 5).  These     
  195    two protocols share some terminology, reference data, and operations.   
  196                                                                            
  197 2.  Terminology                                                            
  198                                                                            
  199    As mentioned above, terminology used as part of the definition of       
  200    IDNA appears in the Definitions document [RFC5890].  It is worth        
  201    noting that some of this terminology overlaps with, and is consistent   
  202    with, that used in Unicode or other character set standards and the     
  203    DNS.  Readers of this document are assumed to be familiar with the      
  204    associated Definitions document and with the DNS-specific terminology   
  205    in RFC 1034 [RFC1034].                                                  
  206                                                                            
  207    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",     
  208    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this    
  209    document are to be interpreted as described in BCP 14, RFC 2119         
  210    [RFC2119].                                                              
  211                                                                            
  212                                                                            
  213                                                                            
  214                                                                            
  215                                                                            
  216                                                                            
  217 Klensin                      Standards Track                    [Page 4]   

  218 RFC 5891                    IDNA2008 Protocol                August 2010   
  219                                                                            
  220                                                                            
  221 3.  Requirements and Applicability                                         
  222                                                                            
  223 3.1.  Requirements                                                         
  224                                                                            
  225    IDNA makes the following requirements:                                  
  226                                                                            
  227    1.  Whenever a domain name is put into a domain name slot that is not   
  228        IDNA-aware (see Section 2.3.2.6 of the Definitions document         
  229        [RFC5890]), it MUST contain only ASCII characters (i.e., its        
  230        labels must be either A-labels or NR-LDH labels), unless the DNS    
  231        application is not subject to historical recommendations for        
  232        "hostname"-style names (see RFC 1034 [RFC1034] and                  
  233        Section 3.2.1).                                                     
  234                                                                            
  235    2.  Labels MUST be compared using equivalent forms: either both         
  236        A-label forms or both U-label forms.  Because A-labels and          
  237        U-labels can be transformed into each other without loss of         
  238        information, these comparisons are equivalent (however, in          
  239        practice, comparison of U-labels requires first verifying that      
  240        they actually are U-labels and not just Unicode strings).  A pair   
  241        of A-labels MUST be compared as case-insensitive ASCII (as with     
  242        all comparisons of ASCII DNS labels).  U-labels MUST be compared    
  243        as-is, without case folding or other intermediate steps.  While     
  244        it is not necessary to validate labels in order to compare them,    
  245        successful comparison does not imply validity.  In many cases,      
  246        not limited to comparison, validation may be important for other    
  247        reasons and SHOULD be performed.                                    
  248                                                                            
  249    3.  Labels being registered MUST conform to the requirements of         
  250        Section 4.  Labels being looked up and the lookup process MUST      
  251        conform to the requirements of Section 5.                           
  252                                                                            
  253 3.2.  Applicability                                                        
  254                                                                            
  255    IDNA applies to all domain names in all domain name slots in            
  256    protocols except where it is explicitly excluded.  It does not apply    
  257    to domain name slots that do not use the LDH syntax rules as            
  258    described in the Definitions document [RFC5890].                        
  259                                                                            
  260    Because it uses the DNS, IDNA applies to many protocols that were       
  261    specified before it was designed.  IDNs occupying domain name slots     
  262    in those older protocols MUST be in A-label form until and unless       
  263    those protocols and their implementations are explicitly upgraded to    
  264    be aware of IDNs and to accept the U-label form.  IDNs actually         
  265    appearing in DNS queries or responses MUST be A-labels.                 
  266                                                                            
  267                                                                            
  268                                                                            
  269                                                                            
  270                                                                            
  271                                                                            
  272 Klensin                      Standards Track                    [Page 5]   

  273 RFC 5891                    IDNA2008 Protocol                August 2010   
  274                                                                            
  275                                                                            
  276    IDNA-aware protocols and implementations MAY accept U-labels,           
  277    A-labels, or both as those particular protocols specify.  IDNA is not   
  278    defined for extended label types (see RFC 2671 [RFC2671], Section 3).   
  279                                                                            
  280 3.2.1.  DNS Resource Records                                               
  281                                                                            
  282    IDNA applies only to domain names in the NAME and RDATA fields of DNS   
  283    resource records whose CLASS is IN.  See the DNS specification          
  284    [RFC1035] for precise definitions of these terms.                       
  285                                                                            
  286    The application of IDNA to DNS resource records depends entirely on     
  287    the CLASS of the record, and not on the TYPE except as noted below.     
  288    This will remain true, even as new TYPEs are defined, unless a new      
  289    TYPE defines TYPE-specific rules.  Special naming conventions for SRV   
  290    records (and "underscore labels" more generally) are incompatible       
  291    with IDNA coding as discussed in the Definitions document [RFC5890],    
  292    especially Section 2.3.2.3.  Of course, underscore labels may be part   
  293    of a domain that uses IDN labels at higher levels in the tree.          
  294                                                                            
  295 3.2.2.  Non-Domain-Name Data Types Stored in the DNS                       
  296                                                                            
  297    Although IDNA enables the representation of non-ASCII characters in     
  298    domain names, that does not imply that IDNA enables the                 
  299    representation of non-ASCII characters in other data types that are     
  300    stored in domain names, specifically in the RDATA field for types       
  301    that have structured RDATA format.  For example, an email address       
  302    local part is stored in a domain name in the RNAME field as part of     
  303    the RDATA of an SOA record (e.g., hostmaster@example.com would be       
  304    represented as hostmaster.example.com).  IDNA does not update the       
  305    existing email standards, which allow only ASCII characters in local    
  306    parts.  Even though work is in progress to define                       
  307    internationalization for email addresses [RFC4952], changes to the      
  308    email address part of the SOA RDATA would require action in, or         
  309    updates to, other standards, specifically those that specify the        
  310    format of the SOA RR.                                                   
  311                                                                            
  312 4.  Registration Protocol                                                  
  313                                                                            
  314    This section defines the model for registering an IDN.  The model is    
  315    implementation independent; any sequence of steps that produces         
  316    exactly the same result for all labels is considered a valid            
  317    implementation.                                                         
  318                                                                            
  319    Note that, while the registration (this section) and lookup protocols   
  320    (Section 5) are very similar in most respects, they are not             
  321    identical, and implementers should carefully follow the steps           
  322    described in this specification.                                        
  323                                                                            
  324                                                                            
  325                                                                            
  326                                                                            
  327 Klensin                      Standards Track                    [Page 6]   

  328 RFC 5891                    IDNA2008 Protocol                August 2010   
  329                                                                            
  330                                                                            
  331 4.1.  Input to IDNA Registration                                           
  332                                                                            
  333    Registration processes, especially processing by entities (often        
  334    called "registrars") who deal with registrants before the request       
  335    actually reaches the zone manager ("registry") are outside the scope    
  336    of this definition and may differ significantly depending on local      
  337    needs.  By the time a string enters the IDNA registration process as    
  338    described in this specification, it MUST be in Unicode and in           
  339    Normalization Form C (NFC [Unicode-UAX15]).  Entities responsible for   
  340    zone files ("registries") MUST accept only the exact string for which   
  341    registration is requested, free of any mappings or local adjustments.   
  342    They MAY accept that input in any of three forms:                       
  343                                                                            
  344    1.  As a pair of A-label and U-label.                                   
  345                                                                            
  346    2.  As an A-label only.                                                 
  347                                                                            
  348    3.  As a U-label only.                                                  
  349                                                                            
  350    The first two of these forms are RECOMMENDED because the use of         
  351    A-labels avoids any possibility of ambiguity.  The first is normally    
  352    preferred over the second because it permits further verification of    
  353    user intent (see Section 4.2.1).                                        
  354                                                                            
  355 4.2.  Permitted Character and Label Validation                             
  356                                                                            
  357 4.2.1.  Input Format                                                       
  358                                                                            
  359    If both the U-label and A-label forms are available, the registry       
  360    MUST ensure that the A-label form is in lowercase, perform a            
  361    conversion to a U-label, perform the steps and tests described below    
  362    on that U-label, and then verify that the A-label produced by the       
  363    step in Section 4.4 matches the one provided as input.  In addition,    
  364    the U-label that was provided as input and the one obtained by          
  365    conversion of the A-label MUST match exactly.  If, for some reason,     
  366    these tests fail, the registration MUST be rejected.                    
  367                                                                            
  368    If only an A-label was provided and the conversion to a U-label is      
  369    not performed, the registry MUST still verify that the A-label is       
  370    superficially valid, i.e., that it does not violate any of the rules    
  371    of Punycode encoding [RFC3492] such as the prohibition on trailing      
  372    hyphen-minus, the requirement that all characters be ASCII, and so      
  373    on.  Strings that appear to be A-labels (e.g., they start with          
  374    "xn--") and strings that are supplied to the registry in a context      
  375    reserved for A-labels (such as a field in a form to be filled out),     
  376    but that are not valid A-labels as described in this paragraph, MUST    
  377    NOT be placed in DNS zones that support IDNA.                           
  378                                                                            
  379                                                                            
  380                                                                            
  381                                                                            
  382 Klensin                      Standards Track                    [Page 7]   

  383 RFC 5891                    IDNA2008 Protocol                August 2010   
  384                                                                            
  385                                                                            
  386    If only an A-label is provided, the conversion to a U-label is not      
  387    performed, but the superficial tests described in the previous          
  388    paragraph are performed, registration procedures MAY, and usually       
  389    will, bypass the tests and actions in the balance of Section 4.2 and    
  390    in Sections 4.3 and 4.4.                                                
  391                                                                            
  392 4.2.2.  Rejection of Characters That Are Not Permitted                     
  393                                                                            
  394    The candidate Unicode string MUST NOT contain characters that appear    
  395    in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables      
  396    document [RFC5892].                                                     
  397                                                                            
  398 4.2.3.  Label Validation                                                   
  399                                                                            
  400    The proposed label (in the form of a Unicode string, i.e., a string     
  401    that at least superficially appears to be a U-label) is then examined   
  402    using tests that require examination of more than one character.        
  403    Character order is considered to be the on-the-wire order.  That        
  404    order may not be the same as the display order.                         
  405                                                                            
  406 4.2.3.1.  Hyphen Restrictions                                              
  407                                                                            
  408    The Unicode string MUST NOT contain "--" (two consecutive hyphens) in   
  409    the third and fourth character positions and MUST NOT start or end      
  410    with a "-" (hyphen).                                                    
  411                                                                            

The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.

This RFC is included in the DNS RFCs annotation project whose home page is here.

GLOBAL V. Risk, ISC.orgBIND 9 implementation note2022-08-15

This RFC is implemented in BIND 9.18 (all versions).

  412 4.2.3.2.  Leading Combining Marks                                          
  413                                                                            
  414    The Unicode string MUST NOT begin with a combining mark or combining    
  415    character (see The Unicode Standard, Section 2.11 [Unicode] for an      
  416    exact definition).                                                      
  417                                                                            
  418 4.2.3.3.  Contextual Rules                                                 
  419                                                                            
  420    The Unicode string MUST NOT contain any characters whose validity is    
  421    context-dependent, unless the validity is positively confirmed by a     
  422    contextual rule.  To check this, each code point identified as          
  423    CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a       
  424    non-null rule.  If such a code point is missing a rule, the label is    
  425    invalid.  If the rule exists but the result of applying the rule is     
  426    negative or inconclusive, the proposed label is invalid.                
  427                                                                            
  428 4.2.3.4.  Labels Containing Characters Written Right to Left               
  429                                                                            
  430    If the proposed label contains any characters from scripts that are     
  431    written from right to left, it MUST meet the Bidi criteria [RFC5893].   
  432                                                                            
  433                                                                            
  434                                                                            
  435                                                                            
  436                                                                            
  437 Klensin                      Standards Track                    [Page 8]   

  438 RFC 5891                    IDNA2008 Protocol                August 2010   
  439                                                                            
  440                                                                            
  441 4.2.4.  Registration Validation Requirements                               
  442                                                                            
  443    Strings that contain at least one non-ASCII character, have been        
  444    produced by the steps above, whose contents pass all of the tests in    
  445    Section 4.2.3, and are 63 or fewer characters long in                   
  446    ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels.   
  447                                                                            
  448    To summarize, tests are made in Section 4.2 for invalid characters,     
  449    invalid combinations of characters, for labels that are invalid even    
  450    if the characters they contain are valid individually, and for labels   
  451    that do not conform to the restrictions for strings containing          
  452    right-to-left characters.                                               
  453                                                                            
  454 4.3.  Registry Restrictions                                                
  455                                                                            
  456    In addition to the rules and tests above, there are many reasons why    
  457    a registry could reject a label.  Registries at all levels of the       
  458    DNS, not just the top level, are expected to establish policies about   
  459    label registrations.  Policies are likely to be informed by the local   
  460    languages and the scripts that are used to write them and may depend    
  461    on many factors including what characters are in the label (for         
  462    example, a label may be rejected based on other labels already          
  463    registered).  See the Rationale document [RFC5894], Section 3.2, for    
  464    further discussion and recommendations about registry policies.         
  465                                                                            
  466    The string produced by the steps in Section 4.2 is checked and          
  467    processed as appropriate to local registry restrictions.  Application   
  468    of those registry restrictions may result in the rejection of some      
  469    labels or the application of special restrictions to others.            
  470                                                                            
  471 4.4.  Punycode Conversion                                                  
  472                                                                            
  473    The resulting U-label is converted to an A-label (defined in Section    
  474    2.3.2.1 of the Definitions document [RFC5890]).  The A-label is the     
  475    encoding of the U-label according to the Punycode algorithm [RFC3492]   
  476    with the ACE prefix "xn--" added at the beginning of the string.  The   
  477    resulting string must, of course, conform to the length limits          
  478    imposed by the DNS.  This document does not update or alter the         
  479    Punycode algorithm specified in RFC 3492 in any way.  RFC 3492 does     
  480    make a non-normative reference to the information about the value and   
  481    construction of the ACE prefix that appears in RFC 3490 or Nameprep     
  482    [RFC3491].  For consistency and reader convenience, IDNA2008            
  483    effectively updates that reference to point to this document.  That     
  484    change does not alter the prefix itself.  The prefix, "xn--", is the    
  485    same in both sets of documents.                                         
  486                                                                            
  487                                                                            
  488                                                                            
  489                                                                            
  490                                                                            
  491                                                                            
  492 Klensin                      Standards Track                    [Page 9]   

  493 RFC 5891                    IDNA2008 Protocol                August 2010   
  494                                                                            
  495                                                                            
  496    With the exception of the maximum string length test on Punycode        
  497    output, the failure conditions identified in the Punycode encoding      
  498    procedure cannot occur if the input is a U-label as determined by the   
  499    steps in Sections 4.1 through 4.3 above.                                
  500                                                                            
  501 4.5.  Insertion in the Zone                                                
  502                                                                            
  503    The label is registered in the DNS by inserting the A-label into a      
  504    zone.                                                                   
  505                                                                            
  506 5.  Domain Name Lookup Protocol                                            
  507                                                                            
  508    Lookup is different from registration and different tests are applied   
  509    on the client.  Although some validity checks are necessary to avoid    
  510    serious problems with the protocol, the lookup-side tests are more      
  511    permissive and rely on the assumption that names that are present in    
  512    the DNS are valid.  That assumption is, however, a weak one because     
  513    the presence of wildcards in the DNS might cause a string that is not   
  514    actually registered in the DNS to be successfully looked up.            
  515                                                                            
  516 5.1.  Label String Input                                                   
  517                                                                            
  518    The user supplies a string in the local character set, for example,     
  519    by typing it, clicking on it, or copying and pasting it from a          
  520    resource identifier, e.g., a Uniform Resource Identifier (URI)          
  521    [RFC3986] or an Internationalized Resource Identifier (IRI)             
  522    [RFC3987], from which the domain name is extracted.  Alternately,       
  523    some process not directly involving the user may read the string from   
  524    a file or obtain it in some other way.  Processing in this step and     
  525    the one specified in Section 5.2 are local matters, to be               
  526    accomplished prior to actual invocation of IDNA.                        
  527                                                                            
  528 5.2.  Conversion to Unicode                                                
  529                                                                            
  530    The string is converted from the local character set into Unicode, if   
  531    it is not already in Unicode.  Depending on local needs, this           
  532    conversion may involve mapping some characters into other characters    
  533    as well as coding conversions.  Those issues are discussed in the       
  534    mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the         
  535    Rationale document [RFC5894] and in the separate Mapping document       
  536    [IDNA2008-Mapping].  The result MUST be a Unicode string in NFC form.   
  537                                                                            
  538 5.3.  A-label Input                                                        
  539                                                                            
  540    If the input to this procedure appears to be an A-label (i.e., it       
  541    starts in "xn--", interpreted case-insensitively), the lookup           
  542    application MAY attempt to convert it to a U-label, first ensuring      
  543    that the A-label is entirely in lowercase (converting it to lowercase   
  544                                                                            
  545                                                                            
  546                                                                            
  547 Klensin                      Standards Track                   [Page 10]   

  548 RFC 5891                    IDNA2008 Protocol                August 2010   
  549                                                                            
  550                                                                            
  551    if necessary), and apply the tests of Section 5.4 and the conversion    
  552    of Section 5.5 to that form.  If the label is converted to Unicode      
  553    (i.e., to U-label form) using the Punycode decoding algorithm, then     
  554    the processing specified in those two sections MUST be performed, and   
  555    the label MUST be rejected if the resulting label is not identical to   
  556    the original.  See Section 8.1 of the Rationale document [RFC5894]      
  557    for additional discussion on this topic.                                
  558                                                                            
  559    Conversion from the A-label and testing that the result is a U-label    
  560    SHOULD be performed if the domain name will later be presented to the   
  561    user in native character form (this requires that the lookup            
  562    application be IDNA-aware).  If those steps are not performed, the      
  563    lookup process SHOULD at least test to determine that the string is     
  564    actually an A-label, examining it for the invalid formats specified     
  565    in the Punycode decoding specification.  Applications that are not      
  566    IDNA-aware will obviously omit that testing; others MAY treat the       
  567    string as opaque to avoid the additional processing at the expense of   
  568    providing less protection and information to users.                     
  569                                                                            
  570 5.4.  Validation and Character List Testing                                
  571                                                                            
  572    As with the registration procedure described in Section 4, the          
  573    Unicode string is checked to verify that all characters that appear     
  574    in it are valid as input to IDNA lookup processing.  As discussed       
  575    above and in the Rationale document [RFC5894], the lookup check is      
  576    more liberal than the registration one.  Labels that have not been      
  577    fully evaluated for conformance to the applicable rules are referred    
  578    to as "putative" labels as discussed in Section 2.3.2.1 of the          
  579    Definitions document [RFC5890].  Putative U-labels with any of the      
  580    following characteristics MUST be rejected prior to DNS lookup:         
  581                                                                            
  582    o  Labels that are not in NFC [Unicode-UAX15].                          
  583                                                                            
  584    o  Labels containing "--" (two consecutive hyphens) in the third and    
  585       fourth character positions.                                          
  586                                                                            
  587    o  Labels whose first character is a combining mark (see The Unicode    
  588       Standard, Section 2.11 [Unicode]).                                   
  589                                                                            
  590    o  Labels containing prohibited code points, i.e., those that are       
  591       assigned to the "DISALLOWED" category of the Tables document         
  592       [RFC5892].                                                           
  593                                                                            
  594    o  Labels containing code points that are identified in the Tables      
  595       document as "CONTEXTJ", i.e., requiring exceptional contextual       
  596       rule processing on lookup, but that do not conform to those rules.   
  597       Note that this implies that a rule must be defined, not null: a      
  598                                                                            
  599                                                                            
  600                                                                            
  601                                                                            
  602 Klensin                      Standards Track                   [Page 11]   

  603 RFC 5891                    IDNA2008 Protocol                August 2010   
  604                                                                            
  605                                                                            
  606       character that requires a contextual rule but for which the rule     
  607       is null is treated in this step as having failed to conform to the   
  608       rule.                                                                
  609                                                                            
  610    o  Labels containing code points that are identified in the Tables      
  611       document as "CONTEXTO", but for which no such rule appears in the    
  612       table of rules.  Applications resolving DNS names or carrying out    
  613       equivalent operations are not required to test contextual rules      
  614       for "CONTEXTO" characters, only to verify that a rule is defined     
  615       (although they MAY make such tests to provide better protection or   
  616       give better information to the user).                                
  617                                                                            
  618    o  Labels containing code points that are unassigned in the version     
  619       of Unicode being used by the application, i.e., in the UNASSIGNED    
  620       category of the Tables document.                                     
  621                                                                            
  622       This requirement means that the application must use a list of       
  623       unassigned characters that is matched to the version of Unicode      
  624       that is being used for the other requirements in this section.  It   
  625       is not required that the application know which version of Unicode   
  626       is being used; that information might be part of the operating       
  627       environment in which the application is running.                     
  628                                                                            
  629    In addition, the application SHOULD apply the following test.           
  630                                                                            
  631    o  Verification that the string is compliant with the requirements      
  632       for right-to-left characters specified in the Bidi document          
  633       [RFC5893].                                                           
  634                                                                            
  635    This test may be omitted in special circumstances, such as when the     
  636    lookup application knows that the conditions are enforced elsewhere,    
  637    because an attempt to look up and resolve such strings will almost      
  638    certainly lead to a DNS lookup failure except when wildcards are        
  639    present in the zone.  However, applying the test is likely to give      
  640    much better information about the reason for a lookup failure --        
  641    information that may be usefully passed to the user when that is        
  642    feasible -- than DNS resolution failure information alone.              
  643                                                                            
  644    For all other strings, the lookup application MUST rely on the          
  645    presence or absence of labels in the DNS to determine the validity of   
  646    those labels and the validity of the characters they contain.  If       
  647    they are registered, they are presumed to be valid; if they are not,    
  648    their possible validity is not relevant.  While a lookup application    
  649    may reasonably issue warnings about strings it believes may be          
  650    problematic, applications that decline to process a string that         
  651    conforms to the rules above (i.e., does not look it up in the DNS)      
  652    are not in conformance with this protocol.                              
  653                                                                            
  654                                                                            
  655                                                                            
  656                                                                            
  657 Klensin                      Standards Track                   [Page 12]   

  658 RFC 5891                    IDNA2008 Protocol                August 2010   
  659                                                                            
  660                                                                            
  661 5.5.  Punycode Conversion                                                  
  662                                                                            
  663    The string that has now been validated for lookup is converted to ACE   
  664    form by applying the Punycode algorithm to the string and then adding   
  665    the ACE prefix ("xn--").                                                
  666                                                                            
  667 5.6.  DNS Name Resolution                                                  
  668                                                                            
  669    The A-label resulting from the conversion in Section 5.5 or supplied    
  670    directly (see Section 5.3) is combined with other labels as needed to   
  671    form a fully-qualified domain name that is then looked up in the DNS,   
  672    using normal DNS resolver procedures.  The lookup can obviously         
  673    either succeed (returning information) or fail.                         
  674                                                                            
  675 6.  Security Considerations                                                
  676                                                                            
  677    Security Considerations for this version of IDNA are described in the   
  678    Definitions document [RFC5890], except for the special issues           
  679    associated with right-to-left scripts and characters.  The latter are   
  680    discussed in the Bidi document [RFC5893].                               
  681                                                                            
  682    In order to avoid intentional or accidental attacks from labels that    
  683    might be confused with others, special problems in rendering, and so    
  684    on, the IDNA model requires that registries exercise care and           
  685    thoughtfulness about what labels they choose to permit.  That issue     
  686    is discussed in Section 4.3 of this document which, in turn, points     
  687    to a somewhat more extensive discussion in the Rationale document       
  688    [RFC5894].                                                              
  689                                                                            
  690 7.  IANA Considerations                                                    
  691                                                                            
  692    IANA actions for this version of IDNA are specified in the Tables       
  693    document [RFC5892] and discussed informally in the Rationale document   
  694    [RFC5894].  The components of IDNA described in this document do not    
  695    require any IANA actions.                                               
  696                                                                            
  697 8.  Contributors                                                           
  698                                                                            
  699    While the listed editor held the pen, the original versions of this     
  700    document represent the joint work and conclusions of an ad hoc design   
  701    team consisting of the editor and, in alphabetic order, Harald          
  702    Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp.  This document   
  703    draws significantly on the original version of IDNA [RFC3490] both      
  704    conceptually and for specific text.  This second-generation version     
  705    would not have been possible without the work that went into that       
  706    first version and especially the contributions of its authors Patrik    
  707    Faltstrom, Paul Hoffman, and Adam Costello.  While Faltstrom was        
  708                                                                            
  709                                                                            
  710                                                                            
  711                                                                            
  712 Klensin                      Standards Track                   [Page 13]   

  713 RFC 5891                    IDNA2008 Protocol                August 2010   
  714                                                                            
  715                                                                            
  716    actively involved in the creation of this version, Hoffman and          
  717    Costello were not and should not be held responsible for any errors     
  718    or omissions.                                                           
  719                                                                            
  720 9.  Acknowledgments                                                        
  721                                                                            
  722    This revision to IDNA would have been impossible without the            
  723    accumulated experience since RFC 3490 was published and resulting       
  724    comments and complaints of many people in the IETF, ICANN, and other    
  725    communities (too many people to list here).  Nor would it have been     
  726    possible without RFC 3490 itself and the efforts of the Working Group   
  727    that defined it.  Those people whose contributions are acknowledged     
  728    in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894]   
  729    were particularly important.                                            
  730                                                                            
  731    Specific textual changes were incorporated into this document after     
  732    suggestions from the other contributors, Stephane Bortzmeyer, Vint      
  733    Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell,      
  734    Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken           
  735    Whistler, Chris Wright, and other WG participants and reviewers         
  736    including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter   
  737    Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific    
  738    errors and recommended corrections.  Special thanks are due to Paul     
  739    Hoffman for permission to extract material to form the basis for        
  740    Appendix A from a draft document that he prepared.                      
  741                                                                            
  742 10.  References                                                            
  743                                                                            
  744 10.1.  Normative References                                                
  745                                                                            
  746    [RFC1034]    Mockapetris, P., "Domain names - concepts and              
  747                 facilities", STD 13, RFC 1034, November 1987.              
  748                                                                            
  749    [RFC1035]    Mockapetris, P., "Domain names - implementation and        
  750                 specification", STD 13, RFC 1035, November 1987.           
  751                                                                            
  752    [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate        
  753                 Requirement Levels", BCP 14, RFC 2119, March 1997.         
  754                                                                            
  755    [RFC3492]    Costello, A., "Punycode: A Bootstring encoding of          
  756                 Unicode for Internationalized Domain Names in              
  757                 Applications (IDNA)", RFC 3492, March 2003.                
  758                                                                            
  759    [RFC5890]    Klensin, J., "Internationalized Domain Names for           
  760                 Applications (IDNA): Definitions and Document              
  761                 Framework", RFC 5890, August 2010.                         
  762                                                                            
  763                                                                            
  764                                                                            
  765                                                                            
  766                                                                            
  767 Klensin                      Standards Track                   [Page 14]   

  768 RFC 5891                    IDNA2008 Protocol                August 2010   
  769                                                                            
  770                                                                            
  771    [RFC5892]    Faltstrom, P., Ed., "The Unicode Code Points and           
  772                 Internationalized Domain Names for Applications (IDNA)",   
  773                 RFC 5892, August 2010.                                     
  774                                                                            
  775    [RFC5893]    Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts    
  776                 for Internationalized Domain Names for Applications        
  777                 (IDNA)", RFC 5893, August 2010.                            
  778                                                                            
  779    [Unicode-UAX15]                                                         
  780                 The Unicode Consortium, "Unicode Standard Annex #15:       
  781                 Unicode Normalization Forms", September 2009,              
  782                 <http://www.unicode.org/reports/tr15/>.                    
  783                                                                            
  784 10.2.  Informative References                                              
  785                                                                            
  786    [ASCII]      American National Standards Institute (formerly United     
  787                 States of America Standards Institute), "USA Code for      
  788                 Information Interchange", ANSI X3.4-1968, 1968.  ANSI      
  789                 X3.4-1968 has been replaced by newer versions with         
  790                 slight modifications, but the 1968 version remains         
  791                 definitive for the Internet.                               
  792                                                                            
  793    [IDNA2008-Mapping]                                                      
  794                 Resnick, P. and P. Hoffman, "Mapping Characters in         
  795                 Internationalized Domain Names for Applications (IDNA)",   
  796                 Work in Progress, April 2010.                              
  797                                                                            
  798    [RFC2671]    Vixie, P., "Extension Mechanisms for DNS (EDNS0)",         
  799                 RFC 2671, August 1999.                                     
  800                                                                            
  801    [RFC3490]    Faltstrom, P., Hoffman, P., and A. Costello,               
  802                 "Internationalizing Domain Names in Applications           
  803                 (IDNA)", RFC 3490, March 2003.                             
  804                                                                            
  805    [RFC3491]    Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep       
  806                 Profile for Internationalized Domain Names (IDN)",         
  807                 RFC 3491, March 2003.                                      
  808                                                                            
  809    [RFC3986]    Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform   
  810                 Resource Identifier (URI): Generic Syntax", STD 66,        
  811                 RFC 3986, January 2005.                                    
  812                                                                            
  813    [RFC3987]    Duerst, M. and M. Suignard, "Internationalized Resource    
  814                 Identifiers (IRIs)", RFC 3987, January 2005.               
  815                                                                            
  816    [RFC4690]    Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review     
  817                 and Recommendations for Internationalized Domain Names     
  818                 (IDNs)", RFC 4690, September 2006.                         
  819                                                                            
  820                                                                            
  821                                                                            
  822 Klensin                      Standards Track                   [Page 15]   

  823 RFC 5891                    IDNA2008 Protocol                August 2010   
  824                                                                            
  825                                                                            
  826    [RFC4952]    Klensin, J. and Y. Ko, "Overview and Framework for         
  827                 Internationalized Email", RFC 4952, July 2007.             
  828                                                                            
  829    [RFC5894]    Klensin, J., "Internationalized Domain Names for           
  830                 Applications (IDNA): Background, Explanation, and          
  831                 Rationale", RFC 5894, August 2010.                         
  832                                                                            
  833    [Unicode]    The Unicode Consortium, "The Unicode Standard, Version     
  834                 5.0", 2007.  Boston, MA, USA: Addison-Wesley.  ISBN        
  835                 0-321-48091-0.  This printed reference has now been        
  836                 updated online to reflect additional code points.  For     
  837                 code points, the reference at the time this document was   
  838                 published is to Unicode 5.2.                               
  839                                                                            
  840                                                                            
  841                                                                            
  842                                                                            
  843                                                                            
  844                                                                            
  845                                                                            
  846                                                                            
  847                                                                            
  848                                                                            
  849                                                                            
  850                                                                            
  851                                                                            
  852                                                                            
  853                                                                            
  854                                                                            
  855                                                                            
  856                                                                            
  857                                                                            
  858                                                                            
  859                                                                            
  860                                                                            
  861                                                                            
  862                                                                            
  863                                                                            
  864                                                                            
  865                                                                            
  866                                                                            
  867                                                                            
  868                                                                            
  869                                                                            
  870                                                                            
  871                                                                            
  872                                                                            
  873                                                                            
  874                                                                            
  875                                                                            
  876                                                                            
  877 Klensin                      Standards Track                   [Page 16]   

  878 RFC 5891                    IDNA2008 Protocol                August 2010   
  879                                                                            
  880                                                                            
  881 Appendix A.  Summary of Major Changes from IDNA2003                        
  882                                                                            
  883    1.   Update base character set from Unicode 3.2 to Unicode version      
  884         agnostic.                                                          
  885                                                                            
  886    2.   Separate the definitions for the "registration" and "lookup"       
  887         activities.                                                        
  888                                                                            
  889    3.   Disallow symbol and punctuation characters except where special    
  890         exceptions are necessary.                                          
  891                                                                            
  892    4.   Remove the mapping and normalization steps from the protocol and   
  893         have them, instead, done by the applications themselves,           
  894         possibly in a local fashion, before invoking the protocol.         
  895                                                                            
  896    5.   Change the way that the protocol specifies which characters are    
  897         allowed in labels from "humans decide what the table of code       
  898         points contains" to "decision about code points are based on       
  899         Unicode properties plus a small exclusion list created by          
  900         humans".                                                           
  901                                                                            
  902    6.   Introduce the new concept of characters that can be used only in   
  903         specific contexts.                                                 
  904                                                                            
  905    7.   Allow typical words and names in languages such as Dhivehi and     
  906         Yiddish to be expressed.                                           
  907                                                                            
  908    8.   Make bidirectional domain names (delimited strings of labels,      
  909         not just labels standing on their own) display in a less           
  910         surprising fashion, whether they appear in obvious domain name     
  911         contexts or as part of running text in paragraphs.                 
  912                                                                            
  913    9.   Remove the dot separator from the mandatory part of the            
  914         protocol.                                                          
  915                                                                            
  916    10.  Make some currently valid labels that are not actually IDNA        
  917         labels invalid.                                                    
  918                                                                            
  919 Author's Address                                                           
  920                                                                            
  921    John C Klensin                                                          
  922    1770 Massachusetts Ave, Ste 322                                         
  923    Cambridge, MA  02140                                                    
  924    USA                                                                     
  925                                                                            
  926    Phone: +1 617 245 1457                                                  
  927    EMail: john+ietf@jck.com                                                
  928                                                                            
  929                                                                            
  930                                                                            
  931                                                                            
  932 Klensin                      Standards Track                   [Page 17]   
  933                                                                            
section-4.2.3.2 Peter Occil(Editorial Erratum #3969) [Held for Document Update]
based on outdated version
The Unicode string MUST NOT begin with a combining mark or combining
character (see The Unicode Standard, Section 2.11 [Unicode] for an
exact definition).
It should say:
The Unicode string MUST NOT begin with a combining mark or combining 
character (see The Unicode Standard, Section 2.11 [Unicode] for an
exact definitionas defined in The Unicode Standard, Section 3.6 [Unicode], 
definition D52).

Section 2.11 of the Unicode Standard explains what combining characters are only in general terms. Section 3.6 contains the actual definition.

----- Verifier Notes -----
The actual fix is probably closer to changing "exact definition" to "explanation of combining characters," and leaving the reference alone. But discussion indicates that more clarification is probably good, and that clarification needs to be in the broader context of a document update.