RFC 3490

    1 Network Working Group                                       P. Faltstrom   
    2 Request for Comments: 3490                                         Cisco   
    3 Category: Standards Track                                     P. Hoffman   
    4                                                               IMC & VPNC   
    5                                                              A. Costello   
    6                                                              UC Berkeley   
    7                                                               March 2003   
    8                                                                            
    9                                                                            
   10          Internationalizing Domain Names in Applications (IDNA)            
   11                                                                            
   12 Status of this Memo                                                        
   13                                                                            
   14    This document specifies an Internet standards track protocol for the    
   15    Internet community, and requests discussion and suggestions for         
   16    improvements.  Please refer to the current edition of the "Internet     
   17    Official Protocol Standards" (STD 1) for the standardization state      
   18    and status of this protocol.  Distribution of this memo is unlimited.   
   19                                                                            
   20 Copyright Notice                                                           
   21                                                                            
   22    Copyright (C) The Internet Society (2003).  All Rights Reserved.        
   23                                                                            
   24 Abstract                                                                   
   25                                                                            
   26    Until now, there has been no standard method for domain names to use    
   27    characters outside the ASCII repertoire.  This document defines         
   28    internationalized domain names (IDNs) and a mechanism called            
   29    Internationalizing Domain Names in Applications (IDNA) for handling     
   30    them in a standard fashion.  IDNs use characters drawn from a large     
   31    repertoire (Unicode), but IDNA allows the non-ASCII characters to be    
   32    represented using only the ASCII characters already allowed in so-      
   33    called host names today.  This backward-compatible representation is    
   34    required in existing protocols like DNS, so that IDNs can be            
   35    introduced with no changes to the existing infrastructure.  IDNA is     
   36    only meant for processing domain names, not free text.                  
   37                                                                            
   38 Table of Contents                                                          
   39                                                                            
   40    1. Introduction..................................................  2    
   41       1.1 Problem Statement.........................................  3    
   42       1.2 Limitations of IDNA.......................................  3    
   43       1.3 Brief overview for application developers.................  4    
   44    2. Terminology...................................................  5    
   45    3. Requirements and applicability................................  7    
   46       3.1 Requirements..............................................  7    
   47       3.2 Applicability.............................................  8    
   48          3.2.1. DNS resource records................................  8    
   49                                                                            
   50                                                                            
   51                                                                            
   52 Faltstrom, et al.           Standards Track                     [Page 1]   

   53 RFC 3490                          IDNA                        March 2003   
   54                                                                            
   55                                                                            
   56          3.2.2. Non-domain-name data types stored in domain names...  9    
   57    4. Conversion operations.........................................  9    
   58       4.1 ToASCII................................................... 10    
   59       4.2 ToUnicode................................................. 11    
   60    5. ACE prefix.................................................... 12    
   61    6. Implications for typical applications using DNS............... 13    
   62       6.1 Entry and display in applications......................... 14    
   63       6.2 Applications and resolver libraries....................... 15    
   64       6.3 DNS servers............................................... 15    
   65       6.4 Avoiding exposing users to the raw ACE encoding........... 16    
   66       6.5  DNSSEC authentication of IDN domain names................ 16    
   67    7. Name server considerations.................................... 17    
   68    8. Root server considerations.................................... 17    
   69    9. References.................................................... 18    
   70       9.1 Normative References...................................... 18    
   71       9.2 Informative References.................................... 18    
   72    10. Security Considerations...................................... 19    
   73    11. IANA Considerations.......................................... 20    
   74    12. Authors' Addresses........................................... 21    
   75    13. Full Copyright Statement..................................... 22    
   76                                                                            
   77 1. Introduction                                                            
   78                                                                            
   79    IDNA works by allowing applications to use certain ASCII name labels    
   80    (beginning with a special prefix) to represent non-ASCII name labels.   
   81    Lower-layer protocols need not be aware of this; therefore IDNA does    
   82    not depend on changes to any infrastructure.  In particular, IDNA       
   83    does not depend on any changes to DNS servers, resolvers, or protocol   
   84    elements, because the ASCII name service provided by the existing DNS   
   85    is entirely sufficient for IDNA.                                        
   86                                                                            
   87    This document does not require any applications to conform to IDNA,     
   88    but applications can elect to use IDNA in order to support IDN while    
   89    maintaining interoperability with existing infrastructure.  If an       
   90    application wants to use non-ASCII characters in domain names, IDNA     
   91    is the only currently-defined option.  Adding IDNA support to an        
   92    existing application entails changes to the application only, and       
   93    leaves room for flexibility in the user interface.                      
   94                                                                            
   95    A great deal of the discussion of IDN solutions has focused on          
   96    transition issues and how IDN will work in a world where not all of     
   97    the components have been updated.  Proposals that were not chosen by    
   98    the IDN Working Group would depend on user applications, resolvers,     
   99    and DNS servers being updated in order for a user to use an             
  100    internationalized domain name.  Rather than rely on widespread          
  101    updating of all components, IDNA depends on updates to user             
  102    applications only; no changes are needed to the DNS protocol or any     
  103    DNS servers or the resolvers on user's computers.                       
  104                                                                            
  105                                                                            
  106                                                                            
  107 Faltstrom, et al.           Standards Track                     [Page 2]   

  108 RFC 3490                          IDNA                        March 2003   
  109                                                                            
  110                                                                            
  111 1.1 Problem Statement                                                      
  112                                                                            
  113    The IDNA specification solves the problem of extending the repertoire   
  114    of characters that can be used in domain names to include the Unicode   
  115    repertoire (with some restrictions).                                    
  116                                                                            
  117    IDNA does not extend the service offered by DNS to the applications.    
  118    Instead, the applications (and, by implication, the users) continue     
  119    to see an exact-match lookup service.  Either there is a single         
  120    exactly-matching name or there is no match.  This model has served      
  121    the existing applications well, but it requires, with or without        
  122    internationalized domain names, that users know the exact spelling of   
  123    the domain names that the users type into applications such as web      
  124    browsers and mail user agents.  The introduction of the larger          
  125    repertoire of characters potentially makes the set of misspellings      
  126    larger, especially given that in some cases the same appearance, for    
  127    example on a business card, might visually match several Unicode code   
  128    points or several sequences of code points.                             
  129                                                                            
  130    IDNA allows the graceful introduction of IDNs not only by avoiding      
  131    upgrades to existing infrastructure (such as DNS servers and mail       
  132    transport agents), but also by allowing some rudimentary use of IDNs    
  133    in applications by using the ASCII representation of the non-ASCII      
  134    name labels.  While such names are very user-unfriendly to read and     
  135    type, and hence are not suitable for user input, they allow (for        
  136    instance) replying to email and clicking on URLs even though the        
  137    domain name displayed is incomprehensible to the user.  In order to     
  138    allow user-friendly input and output of the IDNs, the applications      
  139    need to be modified to conform to this specification.                   
  140                                                                            
  141    IDNA uses the Unicode character repertoire, which avoids the            
  142    significant delays that would be inherent in waiting for a different    
  143    and specific character set be defined for IDN purposes by some other    
  144    standards developing organization.                                      
  145                                                                            
  146 1.2 Limitations of IDNA                                                    
  147                                                                            
  148    The IDNA protocol does not solve all linguistic issues with users       
  149    inputting names in different scripts.  Many important language-based    
  150    and script-based mappings are not covered in IDNA and need to be        
  151    handled outside the protocol.  For example, names that are entered in   
  152    a mix of traditional and simplified Chinese characters will not be      
  153    mapped to a single canonical name.  Another example is Scandinavian     
  154    names that are entered with U+00F6 (LATIN SMALL LETTER O WITH           
  155    DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH      
  156    STROKE).                                                                
  157                                                                            
  158                                                                            
  159                                                                            
  160                                                                            
  161                                                                            
  162 Faltstrom, et al.           Standards Track                     [Page 3]   

  163 RFC 3490                          IDNA                        March 2003   
  164                                                                            
  165                                                                            
  166    An example of an important issue that is not considered in detail in    
  167    IDNA is how to provide a high probability that a user who is entering   
  168    a domain name based on visual information (such as from a business      
  169    card or billboard) or aural information (such as from a telephone or    
  170    radio) would correctly enter the IDN.  Similar issues exist for ASCII   
  171    domain names, for example the possible visual confusion between the     
  172    letter 'O' and the digit zero, but the introduction of the larger       
  173    repertoire of characters creates more opportunities of similar          
  174    looking and similar sounding names.  Note that this is a complex        
  175    issue relating to languages, input methods on computers, and so on.     
  176    Furthermore, the kind of matching and searching necessary for a high    
  177    probability of success would not fit the role of the DNS and its        
  178    exact matching function.                                                
  179                                                                            
  180 1.3 Brief overview for application developers                              
  181                                                                            
  182    Applications can use IDNA to support internationalized domain names     
  183    anywhere that ASCII domain names are already supported, including DNS   
  184    master files and resolver interfaces.  (Applications can also define    
  185    protocols and interfaces that support IDNs directly using non-ASCII     
  186    representations.  IDNA does not prescribe any particular                
  187    representation for new protocols, but it still defines which names      
  188    are valid and how they are compared.)                                   
  189                                                                            
  190    The IDNA protocol is contained completely within applications.  It is   
  191    not a client-server or peer-to-peer protocol: everything is done        
  192    inside the application itself.  When used with a DNS resolver           
  193    library, IDNA is inserted as a "shim" between the application and the   
  194    resolver library.  When used for writing names into a DNS zone, IDNA    
  195    is used just before the name is committed to the zone.                  
  196                                                                            
  197    There are two operations described in section 4 of this document:       
  198                                                                            
  199    -  The ToASCII operation is used before sending an IDN to something     
  200       that expects ASCII names (such as a resolver) or writing an IDN      
  201       into a place that expects ASCII names (such as a DNS master file).   
  202                                                                            
  203    -  The ToUnicode operation is used when displaying names to users,      
  204       for example names obtained from a DNS zone.                          
  205                                                                            
  206    It is important to note that the ToASCII operation can fail.  If it     
  207    fails when processing a domain name, that domain name cannot be used    
  208    as an internationalized domain name and the application has to have     
  209    some method of dealing with this failure.                               
  210                                                                            
  211    IDNA requires that implementations process input strings with           
  212    Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP],     
  213    and then with Punycode [PUNYCODE].  Implementations of IDNA MUST        
  214                                                                            
  215                                                                            
  216                                                                            
  217 Faltstrom, et al.           Standards Track                     [Page 4]   

  218 RFC 3490                          IDNA                        March 2003   
  219                                                                            
  220                                                                            
  221    fully implement Nameprep and Punycode; neither Nameprep nor Punycode    
  222    are optional.                                                           
  223                                                                            
  224 2. Terminology                                                             
  225                                                                            
  226    The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",     
  227    and "MAY" in this document are to be interpreted as described in BCP    
  228    14, RFC 2119 [RFC2119].                                                 
  229                                                                            
  230    A code point is an integer value associated with a character in a       
  231    coded character set.                                                    
  232                                                                            
  233    Unicode [UNICODE] is a coded character set containing tens of           
  234    thousands of characters.  A single Unicode code point is denoted by     
  235    "U+" followed by four to six hexadecimal digits, while a range of       
  236    Unicode code points is denoted by two hexadecimal numbers separated     
  237    by "..", with no prefixes.                                              
  238                                                                            
  239    ASCII means US-ASCII [USASCII], a coded character set containing 128    
  240    characters associated with code points in the range 0..7F.  Unicode     
  241    is an extension of ASCII: it includes all the ASCII characters and      
  242    associates them with the same code points.                              
  243                                                                            
  244    The term "LDH code points" is defined in this document to mean the      
  245    code points associated with ASCII letters, digits, and the hyphen-      
  246    minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an         
  247    abbreviation for "letters, digits, hyphen".                             
  248                                                                            
  249    [STD13] talks about "domain names" and "host names", but many people    
  250    use the terms interchangeably.  Further, because [STD13] was not        
  251    terribly clear, many people who are sure they know the exact            
  252    definitions of each of these terms disagree on the definitions.  In     
  253    this document the term "domain name" is used in general.  This          
  254    document explicitly cites [STD3] whenever referring to the host name    
  255    syntax restrictions defined therein.                                    
  256                                                                            
  257    A label is an individual part of a domain name.  Labels are usually     
  258    shown separated by dots; for example, the domain name                   
  259    "www.example.com" is composed of three labels: "www", "example", and    
  260    "com".  (The zero-length root label described in [STD13], which can     
  261    be explicit as in "www.example.com." or implicit as in                  
  262    "www.example.com", is not considered a label in this specification.)    
  263    IDNA extends the set of usable characters in labels that are text.      
  264    For the rest of this document, the term "label" is shorthand for        
  265    "text label", and "every label" means "every text label".               
  266                                                                            
  267                                                                            
  268                                                                            
  269                                                                            
  270                                                                            
  271                                                                            
  272 Faltstrom, et al.           Standards Track                     [Page 5]   

  273 RFC 3490                          IDNA                        March 2003   
  274                                                                            
  275                                                                            
  276    An "internationalized label" is a label to which the ToASCII            
  277    operation (see section 4) can be applied without failing (with the      
  278    UseSTD3ASCIIRules flag unset).  This implies that every ASCII label     
  279    that satisfies the [STD13] length restriction is an internationalized   
  280    label.  Therefore the term "internationalized label" is a               
  281    generalization, embracing both old ASCII labels and new non-ASCII       
  282    labels.  Although most Unicode characters can appear in                 
  283    internationalized labels, ToASCII will fail for some input strings,     
  284    and such strings are not valid internationalized labels.                
  285                                                                            
  286    An "internationalized domain name" (IDN) is a domain name in which      
  287    every label is an internationalized label.  This implies that every     
  288    ASCII domain name is an IDN (which implies that it is possible for a    
  289    name to be an IDN without it containing any non-ASCII characters).      
  290    This document does not attempt to define an "internationalized host     
  291    name".  Just as has been the case with ASCII names, some DNS zone       
  292    administrators may impose restrictions, beyond those imposed by DNS     
  293    or IDNA, on the characters or strings that may be registered as         
  294    labels in their zones.  Such restrictions have no impact on the         
  295    syntax or semantics of DNS protocol messages; a query for a name that   
  296    matches no records will yield the same response regardless of the       
  297    reason why it is not in the zone.  Clients issuing queries or           
  298    interpreting responses cannot be assumed to have any knowledge of       
  299    zone-specific restrictions or conventions.                              
  300                                                                            
  301    In IDNA, equivalence of labels is defined in terms of the ToASCII       
  302    operation, which constructs an ASCII form for a given label, whether    
  303    or not the label was already an ASCII label.  Labels are defined to     
  304    be equivalent if and only if their ASCII forms produced by ToASCII      
  305    match using a case-insensitive ASCII comparison.  ASCII labels          
  306    already have a notion of equivalence: upper case and lower case are     
  307    considered equivalent.  The IDNA notion of equivalence is an            
  308    extension of that older notion.  Equivalent labels in IDNA are          
  309    treated as alternate forms of the same label, just as "foo" and "Foo"   
  310    are treated as alternate forms of the same label.                       
  311                                                                            
  312    To allow internationalized labels to be handled by existing             
  313    applications, IDNA uses an "ACE label" (ACE stands for ASCII            
  314    Compatible Encoding).  An ACE label is an internationalized label       
  315    that can be rendered in ASCII and is equivalent to an                   
  316    internationalized label that cannot be rendered in ASCII.  Given any    
  317    internationalized label that cannot be rendered in ASCII, the ToASCII   
  318    operation will convert it to an equivalent ACE label (whereas an        
  319    ASCII label will be left unaltered by ToASCII).  ACE labels are         
  320    unsuitable for display to users.  The ToUnicode operation will          
  321    convert any label to an equivalent non-ACE label.  In fact, an ACE      
  322    label is formally defined to be any label that the ToUnicode            
  323    operation would alter (whereas non-ACE labels are left unaltered by     
  324                                                                            
  325                                                                            
  326                                                                            
  327 Faltstrom, et al.           Standards Track                     [Page 6]   

  328 RFC 3490                          IDNA                        March 2003   
  329                                                                            
  330                                                                            
  331    ToUnicode).  Every ACE label begins with the ACE prefix specified in    
  332    section 5.  The ToASCII and ToUnicode operations are specified in       
  333    section 4.                                                              
  334                                                                            
  335    The "ACE prefix" is defined in this document to be a string of ASCII    
  336    characters that appears at the beginning of every ACE label.  It is     
  337    specified in section 5.                                                 
  338                                                                            
  339    A "domain name slot" is defined in this document to be a protocol       
  340    element or a function argument or a return value (and so on)            
  341    explicitly designated for carrying a domain name.  Examples of domain   
  342    name slots include: the QNAME field of a DNS query; the name argument   
  343    of the gethostbyname() library function; the part of an email address   
  344    following the at-sign (@) in the From: field of an email message        
  345    header; and the host portion of the URI in the src attribute of an      
  346    HTML <IMG> tag.  General text that just happens to contain a domain     
  347    name is not a domain name slot; for example, a domain name appearing    
  348    in the plain text body of an email message is not occupying a domain    
  349    name slot.                                                              
  350                                                                            
  351    An "IDN-aware domain name slot" is defined in this document to be a     
  352    domain name slot explicitly designated for carrying an                  
  353    internationalized domain name as defined in this document.  The         
  354    designation may be static (for example, in the specification of the     
  355    protocol or interface) or dynamic (for example, as a result of          
  356    negotiation in an interactive session).                                 
  357                                                                            
  358    An "IDN-unaware domain name slot" is defined in this document to be     
  359    any domain name slot that is not an IDN-aware domain name slot.         
  360    Obviously, this includes any domain name slot whose specification       
  361    predates IDNA.                                                          
  362                                                                            
  363 3. Requirements and applicability                                          
  364                                                                            
  365 3.1 Requirements                                                           
  366                                                                            
  367    IDNA conformance means adherence to the following four requirements:    
  368                                                                            
  369    1) Whenever dots are used as label separators, the following            
  370       characters MUST be recognized as dots: U+002E (full stop), U+3002    
  371       (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61        
  372       (halfwidth ideographic full stop).                                   
  373                                                                            
  374    2) Whenever a domain name is put into an IDN-unaware domain name slot   
  375       (see section 2), it MUST contain only ASCII characters.  Given an    
  376       internationalized domain name (IDN), an equivalent domain name       
  377       satisfying this requirement can be obtained by applying the          
  378                                                                            
  379                                                                            
  380                                                                            
  381                                                                            
  382 Faltstrom, et al.           Standards Track                     [Page 7]   

  383 RFC 3490                          IDNA                        March 2003   
  384                                                                            
  385                                                                            
  386       ToASCII operation (see section 4) to each label and, if dots are     
  387       used as label separators, changing all the label separators to       
  388       U+002E.                                                              
  389                                                                            
  390    3) ACE labels obtained from domain name slots SHOULD be hidden from     
  391       users when it is known that the environment can handle the non-ACE   
  392       form, except when the ACE form is explicitly requested.  When it     
  393       is not known whether or not the environment can handle the non-ACE   
  394       form, the application MAY use the non-ACE form (which might fail,    
  395       such as by not being displayed properly), or it MAY use the ACE      
  396       form (which will look unintelligle to the user).  Given an           
  397       internationalized domain name, an equivalent domain name             
  398       containing no ACE labels can be obtained by applying the ToUnicode   
  399       operation (see section 4) to each label.  When requirements 2 and    
  400       3 both apply, requirement 2 takes precedence.                        
  401                                                                            
  402    4) Whenever two labels are compared, they MUST be considered to match   
  403       if and only if they are equivalent, that is, their ASCII forms       
  404       (obtained by applying ToASCII) match using a case-insensitive        
  405       ASCII comparison.  Whenever two names are compared, they MUST be     
  406       considered to match if and only if their corresponding labels        
  407       match, regardless of whether the names use the same forms of label   
  408       separators.                                                          
  409                                                                            
  410 3.2 Applicability                                                          
  411                                                                            
  412    IDNA is applicable to all domain names in all domain name slots         
  413    except where it is explicitly excluded.                                 
  414                                                                            
  415    This implies that IDNA is applicable to many protocols that predate     
  416    IDNA.  Note that IDNs occupying domain name slots in those protocols    
  417    MUST be in ASCII form (see section 3.1, requirement 2).                 
  418                                                                            
  419 3.2.1. DNS resource records                                                
  420                                                                            
  421    IDNA does not apply to domain names in the NAME and RDATA fields of     
  422    DNS resource records whose CLASS is not IN.  This exclusion applies     
  423    to every non-IN class, present and future, except where future          
  424    standards override this exclusion by explicitly inviting the use of     
  425    IDNA.                                                                   
  426                                                                            
  427    There are currently no other exclusions on the applicability of IDNA    
  428    to DNS resource records; it depends entirely on the CLASS, and not on   
  429    the TYPE.  This will remain true, even as new types are defined,        
  430    unless there is a compelling reason for a new type to complicate        
  431    matters by imposing type-specific rules.                                
  432                                                                            
  433                                                                            
  434                                                                            
  435                                                                            
  436                                                                            
  437 Faltstrom, et al.           Standards Track                     [Page 8]   

  438 RFC 3490                          IDNA                        March 2003   
  439                                                                            
  440                                                                            
  441 3.2.2. Non-domain-name data types stored in domain names                   
  442                                                                            
  443    Although IDNA enables the representation of non-ASCII characters in     
  444    domain names, that does not imply that IDNA enables the                 
  445    representation of non-ASCII characters in other data types that are     
  446    stored in domain names.  For example, an email address local part is    
  447    sometimes stored in a domain label (hostmaster@example.com would be     
  448    represented as hostmaster.example.com in the RDATA field of an SOA      
  449    record).  IDNA does not update the existing email standards, which      
  450    allow only ASCII characters in local parts.  Therefore, unless the      
  451    email standards are revised to invite the use of IDNA for local         
  452    parts, a domain label that holds the local part of an email address     
  453    SHOULD NOT begin with the ACE prefix, and even if it does, it is to     
  454    be interpreted literally as a local part that happens to begin with     
  455    the ACE prefix.                                                         
  456                                                                            
  457 4. Conversion operations                                                   
  458                                                                            
  459    An application converts a domain name put into an IDN-unaware slot or   
  460    displayed to a user.  This section specifies the steps to perform in    
  461    the conversion, and the ToASCII and ToUnicode operations.               
  462                                                                            
  463    The input to ToASCII or ToUnicode is a single label that is a           
  464    sequence of Unicode code points (remember that all ASCII code points    
  465    are also Unicode code points).  If a domain name is represented using   
  466    a character set other than Unicode or US-ASCII, it will first need to   
  467    be transcoded to Unicode.                                               
  468                                                                            
  469    Starting from a whole domain name, the steps that an application        
  470    takes to do the conversions are:                                        
  471                                                                            
  472    1) Decide whether the domain name is a "stored string" or a "query      
  473       string" as described in [STRINGPREP].  If this conversion follows    
  474       the "queries" rule from [STRINGPREP], set the flag called            
  475       "AllowUnassigned".                                                   
  476                                                                            
  477    2) Split the domain name into individual labels as described in         
  478       section 3.1.  The labels do not include the separator.               
  479                                                                            
  480    3) For each label, decide whether or not to enforce the restrictions    
  481       on ASCII characters in host names [STD3].  (Applications already     
  482       faced this choice before the introduction of IDNA, and can           
  483       continue to make the decision the same way they always have; IDNA    
  484       makes no new recommendations regarding this choice.)  If the         
  485       restrictions are to be enforced, set the flag called                 
  486       "UseSTD3ASCIIRules" for that label.                                  
  487                                                                            
  488                                                                            
  489                                                                            
  490                                                                            
  491                                                                            
  492 Faltstrom, et al.           Standards Track                     [Page 9]   

  493 RFC 3490                          IDNA                        March 2003   
  494                                                                            
  495                                                                            
  496    4) Process each label with either the ToASCII or the ToUnicode          
  497       operation as appropriate.  Typically, you use the ToASCII            
  498       operation if you are about to put the name into an IDN-unaware       
  499       slot, and you use the ToUnicode operation if you are displaying      
  500       the name to a user; section 3.1 gives greater detail on the          
  501       applicable requirements.                                             
  502                                                                            
  503    5) If ToASCII was applied in step 4 and dots are used as label          
  504       separators, change all the label separators to U+002E (full stop).   
  505                                                                            
  506    The following two subsections define the ToASCII and ToUnicode          
  507    operations that are used in step 4.                                     
  508                                                                            
  509    This description of the protocol uses specific procedure names, names   
  510    of flags, and so on, in order to facilitate the specification of the    
  511    protocol.  These names, as well as the actual steps of the              
  512    procedures, are not required of an implementation.  In fact, any        
  513    implementation which has the same external behavior as specified in     
  514    this document conforms to this specification.                           
  515                                                                            
  516 4.1 ToASCII                                                                
  517                                                                            
  518    The ToASCII operation takes a sequence of Unicode code points that      
  519    make up one label and transforms it into a sequence of code points in   
  520    the ASCII range (0..7F).  If ToASCII succeeds, the original sequence    
  521    and the resulting sequence are equivalent labels.                       
  522                                                                            
  523    It is important to note that the ToASCII operation can fail.  ToASCII   
  524    fails if any step of it fails.  If any step of the ToASCII operation    
  525    fails on any label in a domain name, that domain name MUST NOT be       
  526    used as an internationalized domain name.  The method for dealing       
  527    with this failure is application-specific.                              
  528                                                                            
  529    The inputs to ToASCII are a sequence of code points, the                
  530    AllowUnassigned flag, and the UseSTD3ASCIIRules flag.  The output of    
  531    ToASCII is either a sequence of ASCII code points or a failure          
  532    condition.                                                              
  533                                                                            
  534    ToASCII never alters a sequence of code points that are all in the      
  535    ASCII range to begin with (although it could fail).  Applying the       
  536    ToASCII operation multiple times has exactly the same effect as         
  537    applying it just once.                                                  
  538                                                                            
  539    ToASCII consists of the following steps:                                
  540                                                                            
  541    1. If the sequence contains any code points outside the ASCII range     
  542       (0..7F) then proceed to step 2, otherwise skip to step 3.            
  543                                                                            
  544                                                                            
  545                                                                            
  546                                                                            
  547 Faltstrom, et al.           Standards Track                    [Page 10]   

  548 RFC 3490                          IDNA                        March 2003   
  549                                                                            
  550                                                                            
  551    2. Perform the steps specified in [NAMEPREP] and fail if there is an    
  552       error.  The AllowUnassigned flag is used in [NAMEPREP].              
  553                                                                            
  554    3. If the UseSTD3ASCIIRules flag is set, then perform these checks:     
  555                                                                            
  556      (a) Verify the absence of non-LDH ASCII code points; that is, the     
  557          absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.             
  558                                                                            
  559      (b) Verify the absence of leading and trailing hyphen-minus; that     
  560          is, the absence of U+002D at the beginning and end of the         
  561          sequence.                                                         
  562                                                                            
  563    4. If the sequence contains any code points outside the ASCII range     
  564       (0..7F) then proceed to step 5, otherwise skip to step 8.            
  565                                                                            
  566    5. Verify that the sequence does NOT begin with the ACE prefix.         
  567                                                                            
  568    6. Encode the sequence using the encoding algorithm in [PUNYCODE] and   
  569       fail if there is an error.                                           
  570                                                                            
  571    7. Prepend the ACE prefix.                                              
  572                                                                            
  573    8. Verify that the number of code points is in the range 1 to 63        
  574       inclusive.                                                           
  575                                                                            
  576 4.2 ToUnicode                                                              
  577                                                                            
  578    The ToUnicode operation takes a sequence of Unicode code points that    
  579    make up one label and returns a sequence of Unicode code points.  If    
  580    the input sequence is a label in ACE form, then the result is an        
  581    equivalent internationalized label that is not in ACE form, otherwise   
  582    the original sequence is returned unaltered.                            
  583                                                                            
  584    ToUnicode never fails.  If any step fails, then the original input      
  585    sequence is returned immediately in that step.                          
  586

top ICANNDNS RFC Annotations project

The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.

This RFC is included in the DNS RFCs annotation project whose home page is here.

GLOBAL OBSOLETED

Obsoleted by RFC5890, RFC5891

GLOBAL HAS ERRATA

Has errata: #266

  587    The ToUnicode output never contains more code points than its input.    
  588    Note that the number of octets needed to represent a sequence of code   
  589    points depends on the particular character encoding used.               
  590                                                                            
  591    The inputs to ToUnicode are a sequence of code points, the              
  592    AllowUnassigned flag, and the UseSTD3ASCIIRules flag.  The output of    
  593    ToUnicode is always a sequence of Unicode code points.                  
  594                                                                            
  595    1. If all code points in the sequence are in the ASCII range (0..7F)    
  596       then skip to step 3.                                                 
  597                                                                            
  598                                                                            
  599                                                                            
  600                                                                            
  601                                                                            
  602 Faltstrom, et al.           Standards Track                    [Page 11]   

  603 RFC 3490                          IDNA                        March 2003   
  604                                                                            
  605                                                                            
  606    2. Perform the steps specified in [NAMEPREP] and fail if there is an    
  607       error.  (If step 3 of ToASCII is also performed here, it will not    
  608       affect the overall behavior of ToUnicode, but it is not              
  609       necessary.)  The AllowUnassigned flag is used in [NAMEPREP].         
  610                                                                            
  611    3. Verify that the sequence begins with the ACE prefix, and save a      
  612       copy of the sequence.                                                
  613                                                                            
  614    4. Remove the ACE prefix.                                               
  615                                                                            
  616    5. Decode the sequence using the decoding algorithm in [PUNYCODE] and   
  617       fail if there is an error.  Save a copy of the result of this        
  618       step.                                                                
  619                                                                            
  620    6. Apply ToASCII.                                                       
  621                                                                            
  622    7. Verify that the result of step 6 matches the saved copy from step    
  623       3, using a case-insensitive ASCII comparison.                        
  624                                                                            
  625    8. Return the saved copy from step 5.                                   
  626                                                                            
  627 5. ACE prefix                                                              
  628                                                                            
  629    The ACE prefix, used in the conversion operations (section 4), is two   
  630    alphanumeric ASCII characters followed by two hyphen-minuses.  It       
  631    cannot be any of the prefixes already used in earlier documents,        
  632    which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--",   
  633    "ra--", "wq--" and "zq--".  The ToASCII and ToUnicode operations MUST   
  634    recognize the ACE prefix in a case-insensitive manner.                  
  635                                                                            
  636    The ACE prefix for IDNA is "xn--" or any capitalization thereof.        
  637                                                                            
  638    This means that an ACE label might be "xn--de-jg4avhby1noc0d", where    
  639    "de-jg4avhby1noc0d" is the part of the ACE label that is generated by   
  640    the encoding steps in [PUNYCODE].                                       
  641                                                                            
  642    While all ACE labels begin with the ACE prefix, not all labels          
  643    beginning with the ACE prefix are necessarily ACE labels.  Non-ACE      
  644    labels that begin with the ACE prefix will confuse users and SHOULD     
  645    NOT be allowed in DNS zones.                                            
  646                                                                            
  647                                                                            
  648                                                                            
  649                                                                            
  650                                                                            
  651                                                                            
  652                                                                            
  653                                                                            
  654                                                                            
  655                                                                            
  656                                                                            
  657 Faltstrom, et al.           Standards Track                    [Page 12]   

  658 RFC 3490                          IDNA                        March 2003   
  659                                                                            
  660                                                                            
  661 6. Implications for typical applications using DNS                         
  662                                                                            
  663    In IDNA, applications perform the processing needed to input            
  664    internationalized domain names from users, display internationalized    
  665    domain names to users, and process the inputs and outputs from DNS      
  666    and other protocols that carry domain names.                            
  667                                                                            
  668    The components and interfaces between them can be represented           
  669    pictorially as:                                                         
  670                                                                            
  671                     +------+                                               
  672                     | User |                                               
  673                     +------+                                               
  674                        ^                                                   
  675                        | Input and display: local interface methods        
  676                        | (pen, keyboard, glowing phosphorus, ...)          
  677    +-------------------|-------------------------------+                   
  678    |                   v                               |                   
  679    |          +-----------------------------+          |                   
  680    |          |        Application          |          |                   
  681    |          |   (ToASCII and ToUnicode    |          |                   
  682    |          |      operations may be      |          |                   
  683    |          |        called here)         |          |                   
  684    |          +-----------------------------+          |                   
  685    |                   ^        ^                      | End system        
  686    |                   |        |                      |                   
  687    | Call to resolver: |        | Application-specific |                   
  688    |              ACE  |        | protocol:            |                   
  689    |                   v        | ACE unless the       |                   
  690    |           +----------+     | protocol is updated  |                   
  691    |           | Resolver |     | to handle other      |                   
  692    |           +----------+     | encodings            |                   
  693    |                 ^          |                      |                   
  694    +-----------------|----------|----------------------+                   
  695        DNS protocol: |          |                                          
  696                  ACE |          |                                          
  697                      v          v                                          
  698           +-------------+    +---------------------+                       
  699           | DNS servers |    | Application servers |                       
  700           +-------------+    +---------------------+                       
  701                                                                            
  702    The box labeled "Application" is where the application splits a         
  703    domain name into labels, sets the appropriate flags, and performs the   
  704    ToASCII and ToUnicode operations.  This is described in section 4.      
  705                                                                            
  706                                                                            
  707                                                                            
  708                                                                            
  709                                                                            
  710                                                                            
  711                                                                            
  712 Faltstrom, et al.           Standards Track                    [Page 13]   

  713 RFC 3490                          IDNA                        March 2003   
  714                                                                            
  715                                                                            
  716 6.1 Entry and display in applications                                      
  717                                                                            
  718    Applications can accept domain names using any character set or sets    
  719    desired by the application developer, and can display domain names in   
  720    any charset.  That is, the IDNA protocol does not affect the            
  721    interface between users and applications.                               
  722                                                                            
  723    An IDNA-aware application can accept and display internationalized      
  724    domain names in two formats: the internationalized character set(s)     
  725    supported by the application, and as an ACE label.  ACE labels that     
  726    are displayed or input MUST always include the ACE prefix.              
  727    Applications MAY allow input and display of ACE labels, but are not     
  728    encouraged to do so except as an interface for special purposes,        
  729    possibly for debugging, or to cope with display limitations as          
  730    described in section 6.4..  ACE encoding is opaque and ugly, and        
  731    should thus only be exposed to users who absolutely need it.  Because   
  732    name labels encoded as ACE name labels can be rendered either as the    
  733    encoded ASCII characters or the proper decoded characters, the          
  734    application MAY have an option for the user to select the preferred     
  735    method of display; if it does, rendering the ACE SHOULD NOT be the      
  736    default.                                                                
  737                                                                            
  738    Domain names are often stored and transported in many places.  For      
  739    example, they are part of documents such as mail messages and web       
  740    pages.  They are transported in many parts of many protocols, such as   
  741    both the control commands and the RFC 2822 body parts of SMTP, and      
  742    the headers and the body content in HTTP.  It is important to           
  743    remember that domain names appear both in domain name slots and in      
  744    the content that is passed over protocols.                              
  745                                                                            
  746    In protocols and document formats that define how to handle             
  747    specification or negotiation of charsets, labels can be encoded in      
  748    any charset allowed by the protocol or document format.  If a           
  749    protocol or document format only allows one charset, the labels MUST    
  750    be given in that charset.                                               
  751                                                                            
  752    In any place where a protocol or document format allows transmission    
  753    of the characters in internationalized labels, internationalized        
  754    labels SHOULD be transmitted using whatever character encoding and      
  755    escape mechanism that the protocol or document format uses at that      
  756    place.                                                                  
  757                                                                            
  758    All protocols that use domain name slots already have the capacity      
  759    for handling domain names in the ASCII charset.  Thus, ACE labels       
  760    (internationalized labels that have been processed with the ToASCII     
  761    operation) can inherently be handled by those protocols.                
  762                                                                            
  763                                                                            
  764                                                                            
  765                                                                            
  766                                                                            
  767 Faltstrom, et al.           Standards Track                    [Page 14]   

  768 RFC 3490                          IDNA                        March 2003   
  769                                                                            
  770                                                                            
  771 6.2 Applications and resolver libraries                                    
  772                                                                            
  773    Applications normally use functions in the operating system when they   
  774    resolve DNS queries.  Those functions in the operating system are       
  775    often called "the resolver library", and the applications communicate   
  776    with the resolver libraries through a programming interface (API).      
  777                                                                            
  778    Because these resolver libraries today expect only domain names in      
  779    ASCII, applications MUST prepare labels that are passed to the          
  780    resolver library using the ToASCII operation.  Labels received from     
  781    the resolver library contain only ASCII characters; internationalized   
  782    labels that cannot be represented directly in ASCII use the ACE form.   
  783    ACE labels always include the ACE prefix.                               
  784                                                                            
  785    An operating system might have a set of libraries for performing the    
  786    ToASCII operation.  The input to such a library might be in one or      
  787    more charsets that are used in applications (UTF-8 and UTF-16 are       
  788    likely candidates for almost any operating system, and script-          
  789    specific charsets are likely for localized operating systems).          
  790                                                                            
  791    IDNA-aware applications MUST be able to work with both non-             
  792    internationalized labels (those that conform to [STD13] and [STD3])     
  793    and internationalized labels.                                           
  794                                                                            
  795    It is expected that new versions of the resolver libraries in the       
  796    future will be able to accept domain names in other charsets than       
  797    ASCII, and application developers might one day pass not only domain    
  798    names in Unicode, but also in local script to a new API for the         
  799    resolver libraries in the operating system.  Thus the ToASCII and       
  800    ToUnicode operations might be performed inside these new versions of    
  801    the resolver libraries.                                                 
  802                                                                            
  803    Domain names passed to resolvers or put into the question section of    
  804    DNS requests follow the rules for "queries" from [STRINGPREP].          
  805                                                                            
  806 6.3 DNS servers                                                            
  807                                                                            
  808    Domain names stored in zones follow the rules for "stored strings"      
  809    from [STRINGPREP].                                                      
  810                                                                            
  811    For internationalized labels that cannot be represented directly in     
  812    ASCII, DNS servers MUST use the ACE form produced by the ToASCII        
  813    operation.  All IDNs served by DNS servers MUST contain only ASCII      
  814    characters.                                                             
  815                                                                            
  816    If a signaling system which makes negotiation possible between old      
  817    and new DNS clients and servers is standardized in the future, the      
  818    encoding of the query in the DNS protocol itself can be changed from    
  819                                                                            
  820                                                                            
  821                                                                            
  822 Faltstrom, et al.           Standards Track                    [Page 15]   

  823 RFC 3490                          IDNA                        March 2003   
  824                                                                            
  825                                                                            
  826    ACE to something else, such as UTF-8.  The question whether or not      
  827    this should be used is, however, a separate problem and is not          
  828    discussed in this memo.                                                 
  829                                                                            
  830 6.4 Avoiding exposing users to the raw ACE encoding                        
  831                                                                            
  832    Any application that might show the user a domain name obtained from    
  833    a domain name slot, such as from gethostbyaddr or part of a mail        
  834    header, will need to be updated if it is to prevent users from seeing   
  835    the ACE.                                                                
  836                                                                            
  837    If an application decodes an ACE name using ToUnicode but cannot show   
  838    all of the characters in the decoded name, such as if the name          
  839    contains characters that the output system cannot display, the          
  840    application SHOULD show the name in ACE format (which always includes   
  841    the ACE prefix) instead of displaying the name with the replacement     
  842    character (U+FFFD).  This is to make it easier for the user to          
  843    transfer the name correctly to other programs.  Programs that by        
  844    default show the ACE form when they cannot show all the characters in   
  845    a name label SHOULD also have a mechanism to show the name that is      
  846    produced by the ToUnicode operation with as many characters as          
  847    possible and replacement characters in the positions where characters   
  848    cannot be displayed.                                                    
  849                                                                            
  850    The ToUnicode operation does not alter labels that are not valid ACE    
  851    labels, even if they begin with the ACE prefix.  After ToUnicode has    
  852    been applied, if a label still begins with the ACE prefix, then it is   
  853    not a valid ACE label, and is not equivalent to any of the              
  854    intermediate Unicode strings constructed by ToUnicode.                  
  855                                                                            
  856 6.5  DNSSEC authentication of IDN domain names                             
  857                                                                            
  858    DNS Security [RFC2535] is a method for supplying cryptographic          
  859    verification information along with DNS messages.  Public Key           
  860    Cryptography is used in conjunction with digital signatures to          
  861    provide a means for a requester of domain information to authenticate   
  862    the source of the data.  This ensures that it can be traced back to a   
  863    trusted source, either directly, or via a chain of trust linking the    
  864    source of the information to the top of the DNS hierarchy.              
  865                                                                            
  866    IDNA specifies that all internationalized domain names served by DNS    
  867    servers that cannot be represented directly in ASCII must use the ACE   
  868    form produced by the ToASCII operation.  This operation must be         
  869    performed prior to a zone being signed by the private key for that      
  870    zone.  Because of this ordering, it is important to recognize that      
  871    DNSSEC authenticates the ASCII domain name, not the Unicode form or     
  872                                                                            
  873                                                                            
  874                                                                            
  875                                                                            
  876                                                                            
  877 Faltstrom, et al.           Standards Track                    [Page 16]   

  878 RFC 3490                          IDNA                        March 2003   
  879                                                                            
  880                                                                            
  881    the mapping between the Unicode form and the ASCII form.  In the        
  882    presence of DNSSEC, this is the name that MUST be signed in the zone    
  883    and MUST be validated against.                                          
  884                                                                            
  885    One consequence of this for sites deploying IDNA in the presence of     
  886    DNSSEC is that any special purpose proxies or forwarders used to        
  887    transform user input into IDNs must be earlier in the resolution flow   
  888    than DNSSEC authenticating nameservers for DNSSEC to work.              
  889                                                                            
  890 7. Name server considerations                                              
  891                                                                            
  892    Existing DNS servers do not know the IDNA rules for handling non-       
  893    ASCII forms of IDNs, and therefore need to be shielded from them.       
  894    All existing channels through which names can enter a DNS server        
  895    database (for example, master files [STD13] and DNS update messages     
  896    [RFC2136]) are IDN-unaware because they predate IDNA, and therefore     
  897    requirement 2 of section 3.1 of this document provides the needed       
  898    shielding, by ensuring that internationalized domain names entering     
  899    DNS server databases through such channels have already been            
  900    converted to their equivalent ASCII forms.                              
  901                                                                            
  902    It is imperative that there be only one ASCII encoding for a            
  903    particular domain name.  Because of the design of the ToASCII and       
  904    ToUnicode operations, there are no ACE labels that decode to ASCII      
  905    labels, and therefore name servers cannot contain multiple ASCII        
  906    encodings of the same domain name.                                      
  907                                                                            
  908    [RFC2181] explicitly allows domain labels to contain octets beyond      
  909    the ASCII range (0..7F), and this document does not change that.        
  910    Note, however, that there is no defined interpretation of octets        
  911    80..FF as characters.  If labels containing these octets are returned   
  912    to applications, unpredictable behavior could result.  The ASCII form   
  913    defined by ToASCII is the only standard representation for              
  914    internationalized labels in the current DNS protocol.                   
  915                                                                            
  916 8. Root server considerations                                              
  917                                                                            
  918    IDNs are likely to be somewhat longer than current domain names, so     
  919    the bandwidth needed by the root servers is likely to go up by a        
  920    small amount.  Also, queries and responses for IDNs will probably be    
  921    somewhat longer than typical queries today, so more queries and         
  922    responses may be forced to go to TCP instead of UDP.                    
  923                                                                            
  924                                                                            
  925                                                                            
  926                                                                            
  927                                                                            
  928                                                                            
  929                                                                            
  930                                                                            
  931                                                                            
  932 Faltstrom, et al.           Standards Track                    [Page 17]   

  933 RFC 3490                          IDNA                        March 2003   
  934                                                                            
  935                                                                            
  936 9. References                                                              
  937                                                                            
  938 9.1 Normative References                                                   
  939                                                                            
  940    [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate        
  941                 Requirement Levels", BCP 14, RFC 2119, March 1997.         
  942                                                                            
  943    [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of               
  944                 Internationalized Strings ("stringprep")", RFC 3454,       
  945                 December 2002.                                             
  946                                                                            
  947    [NAMEPREP]   Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep       
  948                 Profile for Internationalized Domain Names (IDN)", RFC     
  949                 3491, March 2003.                                          
  950                                                                            
  951    [PUNYCODE]   Costello, A., "Punycode: A Bootstring encoding of          
  952                 Unicode for use with Internationalized Domain Names in     
  953                 Applications (IDNA)", RFC 3492, March 2003.                
  954                                                                            
  955    [STD3]       Braden, R., "Requirements for Internet Hosts --            
  956                 Communication Layers", STD 3, RFC 1122, and                
  957                 "Requirements for Internet Hosts -- Application and        
  958                 Support", STD 3, RFC 1123, October 1989.                   
  959                                                                            
  960    [STD13]      Mockapetris, P., "Domain names - concepts and              
  961                 facilities", STD 13, RFC 1034 and "Domain names -          
  962                 implementation and specification", STD 13, RFC 1035,       
  963                 November 1987.                                             
  964                                                                            
  965 9.2 Informative References                                                 
  966                                                                            
  967    [RFC2535]    Eastlake, D., "Domain Name System Security Extensions",    
  968                 RFC 2535, March 1999.                                      
  969                                                                            
  970    [RFC2181]    Elz, R. and R. Bush, "Clarifications to the DNS            
  971                 Specification", RFC 2181, July 1997.                       
  972                                                                            
  973    [UAX9]       Unicode Standard Annex #9, The Bidirectional Algorithm,    
  974                 <http://www.unicode.org/unicode/reports/tr9/>.             
  975                                                                            
  976    [UNICODE]    The Unicode Consortium. The Unicode Standard, Version      
  977                 3.2.0 is defined by The Unicode Standard, Version 3.0      
  978                 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),   
  979                 as amended by the Unicode Standard Annex #27: Unicode      
  980                 3.1 (http://www.unicode.org/reports/tr27/) and by the      
  981                 Unicode Standard Annex #28: Unicode 3.2                    
  982                 (http://www.unicode.org/reports/tr28/).                    
  983                                                                            
  984                                                                            
  985                                                                            
  986                                                                            
  987 Faltstrom, et al.           Standards Track                    [Page 18]   

  988 RFC 3490                          IDNA                        March 2003   
  989                                                                            
  990                                                                            
  991    [USASCII]    Cerf, V., "ASCII format for Network Interchange", RFC      
  992                 20, October 1969.                                          
  993                                                                            
  994 10. Security Considerations                                                
  995                                                                            
  996    Security on the Internet partly relies on the DNS.  Thus, any change    
  997    to the characteristics of the DNS can change the security of much of    
  998    the Internet.                                                           
  999                                                                            
 1000    This memo describes an algorithm which encodes characters that are      
 1001    not valid according to STD3 and STD13 into octet values that are        
 1002    valid.  No security issues such as string length increases or new       
 1003    allowed values are introduced by the encoding process or the use of     
 1004    these encoded values, apart from those introduced by the ACE encoding   
 1005    itself.                                                                 
 1006                                                                            
 1007    Domain names are used by users to identify and connect to Internet      
 1008    servers.  The security of the Internet is compromised if a user         
 1009    entering a single internationalized name is connected to different      
 1010    servers based on different interpretations of the internationalized     
 1011    domain name.                                                            
 1012                                                                            
 1013    When systems use local character sets other than ASCII and Unicode,     
 1014    this specification leaves the the problem of transcoding between the    
 1015    local character set and Unicode up to the application.  If different    
 1016    applications (or different versions of one application) implement       
 1017    different transcoding rules, they could interpret the same name         
 1018    differently and contact different servers.  This problem is not         
 1019    solved by security protocols like TLS that do not take local            
 1020    character sets into account.                                            
 1021                                                                            
 1022    Because this document normatively refers to [NAMEPREP], [PUNYCODE],     
 1023    and [STRINGPREP], it includes the security considerations from those    
 1024    documents as well.                                                      
 1025                                                                            
 1026    If or when this specification is updated to use a more recent Unicode   
 1027    normalization table, the new normalization table will need to be        
 1028    compared with the old to spot backwards incompatible changes.  If       
 1029    there are such changes, they will need to be handled somehow, or        
 1030    there will be security as well as operational implications.  Methods    
 1031    to handle the conflicts could include keeping the old normalization,    
 1032    or taking care of the conflicting characters by operational means, or   
 1033    some other method.                                                      
 1034                                                                            
 1035    Implementations MUST NOT use more recent normalization tables than      
 1036    the one referenced from this document, even though more recent tables   
 1037    may be provided by operating systems.  If an application is unsure of   
 1038    which version of the normalization tables are in the operating          
 1039                                                                            
 1040                                                                            
 1041                                                                            
 1042 Faltstrom, et al.           Standards Track                    [Page 19]   

 1043 RFC 3490                          IDNA                        March 2003   
 1044                                                                            
 1045                                                                            
 1046    system, the application needs to include the normalization tables       
 1047    itself.  Using normalization tables other than the one referenced       
 1048    from this specification could have security and operational             
 1049    implications.                                                           
 1050                                                                            
 1051    To help prevent confusion between characters that are visually          
 1052    similar, it is suggested that implementations provide visual            
 1053    indications where a domain name contains multiple scripts.  Such        
 1054    mechanisms can also be used to show when a name contains a mixture of   
 1055    simplified and traditional Chinese characters, or to distinguish zero   
 1056    and one from O and l.  DNS zone adminstrators may impose restrictions   
 1057    (subject to the limitations in section 2) that try to minimize          
 1058    homographs.                                                             
 1059                                                                            
 1060    Domain names (or portions of them) are sometimes compared against a     
 1061    set of privileged or anti-privileged domains.  In such situations it    
 1062    is especially important that the comparisons be done properly, as       
 1063    specified in section 3.1 requirement 4.  For labels already in ASCII    
 1064    form, the proper comparison reduces to the same case-insensitive        
 1065    ASCII comparison that has always been used for ASCII labels.            
 1066                                                                            
 1067    The introduction of IDNA means that any existing labels that start      
 1068    with the ACE prefix and would be altered by ToUnicode will              
 1069    automatically be ACE labels, and will be considered equivalent to       
 1070    non-ASCII labels, whether or not that was the intent of the zone        
 1071    adminstrator or registrant.                                             
 1072                                                                            
 1073 11. IANA Considerations                                                    
 1074                                                                            
 1075    IANA has assigned the ACE prefix in consultation with the IESG.         
 1076                                                                            
 1077                                                                            
 1078                                                                            
 1079                                                                            
 1080                                                                            
 1081                                                                            
 1082                                                                            
 1083                                                                            
 1084                                                                            
 1085                                                                            
 1086                                                                            
 1087                                                                            
 1088                                                                            
 1089                                                                            
 1090                                                                            
 1091                                                                            
 1092                                                                            
 1093                                                                            
 1094                                                                            
 1095                                                                            
 1096                                                                            
 1097 Faltstrom, et al.           Standards Track                    [Page 20]   

 1098 RFC 3490                          IDNA                        March 2003   
 1099                                                                            
 1100                                                                            
 1101 12. Authors' Addresses                                                     
 1102                                                                            
 1103    Patrik Faltstrom                                                        
 1104    Cisco Systems                                                           
 1105    Arstaangsvagen 31 J                                                     
 1106    S-117 43 Stockholm  Sweden                                              
 1107                                                                            
 1108    EMail: paf@cisco.com                                                    
 1109                                                                            
 1110                                                                            
 1111    Paul Hoffman                                                            
 1112    Internet Mail Consortium and VPN Consortium                             
 1113    127 Segre Place                                                         
 1114    Santa Cruz, CA  95060  USA                                              
 1115                                                                            
 1116    EMail: phoffman@imc.org                                                 
 1117                                                                            
 1118                                                                            
 1119    Adam M. Costello                                                        
 1120    University of California, Berkeley                                      
 1121                                                                            
 1122    URL: http://www.nicemice.net/amc/                                       
 1123                                                                            
 1124                                                                            
 1125                                                                            
 1126                                                                            
 1127                                                                            
 1128                                                                            
 1129                                                                            
 1130                                                                            
 1131                                                                            
 1132                                                                            
 1133                                                                            
 1134                                                                            
 1135                                                                            
 1136                                                                            
 1137                                                                            
 1138                                                                            
 1139                                                                            
 1140                                                                            
 1141                                                                            
 1142                                                                            
 1143                                                                            
 1144                                                                            
 1145                                                                            
 1146                                                                            
 1147                                                                            
 1148                                                                            
 1149                                                                            
 1150                                                                            
 1151                                                                            
 1152 Faltstrom, et al.           Standards Track                    [Page 21]   

 1153 RFC 3490                          IDNA                        March 2003   
 1154                                                                            
 1155                                                                            
 1156 13. Full Copyright Statement                                               
 1157                                                                            
 1158    Copyright (C) The Internet Society (2003).  All Rights Reserved.        
 1159                                                                            
 1160    This document and translations of it may be copied and furnished to     
 1161    others, and derivative works that comment on or otherwise explain it    
 1162    or assist in its implementation may be prepared, copied, published      
 1163    and distributed, in whole or in part, without restriction of any        
 1164    kind, provided that the above copyright notice and this paragraph are   
 1165    included on all such copies and derivative works.  However, this        
 1166    document itself may not be modified in any way, such as by removing     
 1167    the copyright notice or references to the Internet Society or other     
 1168    Internet organizations, except as needed for the purpose of             
 1169    developing Internet standards in which case the procedures for          
 1170    copyrights defined in the Internet Standards process must be            
 1171    followed, or as required to translate it into languages other than      
 1172    English.                                                                
 1173                                                                            
 1174    The limited permissions granted above are perpetual and will not be     
 1175    revoked by the Internet Society or its successors or assigns.           
 1176                                                                            
 1177    This document and the information contained herein is provided on an    
 1178    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING     
 1179    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING      
 1180    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION         
 1181    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF        
 1182    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.                    
 1183                                                                            
 1184 Acknowledgement                                                            
 1185                                                                            
 1186    Funding for the RFC Editor function is currently provided by the        
 1187    Internet Society.                                                       
 1188                                                                            
 1189                                                                            
 1190                                                                            
 1191                                                                            
 1192                                                                            
 1193                                                                            
 1194                                                                            
 1195                                                                            
 1196                                                                            
 1197                                                                            
 1198                                                                            
 1199                                                                            
 1200                                                                            
 1201                                                                            
 1202                                                                            
 1203                                                                            
 1204                                                                            
 1205                                                                            
 1206                                                                            
 1207 Faltstrom, et al.           Standards Track                    [Page 22]   
 1208

line-587 Adam Costello(Technical Erratum #266) [Verified]

based on outdated version

   The ToUnicode output never contains more code points than its input.

This is not true; I have constructed a counterexample.

It should say:

    The ToUniCode output never contains more code points than its inputPunycode decoder can never output more code points than it
    inputs, but Nameprep can, and therefore ToUnicode can.