RFC 8264

    1 Internet Engineering Task Force (IETF)                    P. Saint-Andre   
    2 Request for Comments: 8264                                    Jabber.org   
    3 Obsoletes: 7564                                              M. Blanchet   
    4 Category: Standards Track                                       Viagenie   
    5 ISSN: 2070-1721                                             October 2017   
    6                                                                            
    7                                                                            
    8      PRECIS Framework: Preparation, Enforcement, and Comparison of         
    9            Internationalized Strings in Application Protocols              
   10                                                                            
   11 Abstract                                                                   
   12                                                                            
   13    Application protocols using Unicode code points in protocol strings     
   14    need to properly handle such strings in order to enforce                
   15    internationalization rules for strings placed in various protocol       
   16    slots (such as addresses and identifiers) and to perform valid          
   17    comparison operations (e.g., for purposes of authentication or          
   18    authorization).  This document defines a framework enabling             
   19    application protocols to perform the preparation, enforcement, and      
   20    comparison of internationalized strings ("PRECIS") in a way that        
   21    depends on the properties of Unicode code points and thus is more       
   22    agile with respect to versions of Unicode.  As a result, this           
   23    framework provides a more sustainable approach to the handling of       
   24    internationalized strings than the previous framework, known as         
   25    Stringprep (RFC 3454).  This document obsoletes RFC 7564.               
   26                                                                            
   27 Status of This Memo                                                        
   28                                                                            
   29    This is an Internet Standards Track document.                           
   30                                                                            
   31    This document is a product of the Internet Engineering Task Force       
   32    (IETF).  It represents the consensus of the IETF community.  It has     
   33    received public review and has been approved for publication by the     
   34    Internet Engineering Steering Group (IESG).  Further information on     
   35    Internet Standards is available in Section 2 of RFC 7841.               
   36                                                                            
   37    Information about the current status of this document, any errata,      
   38    and how to provide feedback on it may be obtained at                    
   39    https://www.rfc-editor.org/info/rfc8264.                                
   40                                                                            
   41 Copyright Notice                                                           
   42                                                                            
   43    Copyright (c) 2017 IETF Trust and the persons identified as the         
   44    document authors.  All rights reserved.                                 
   45                                                                            
   46    This document is subject to BCP 78 and the IETF Trust's Legal           
   47    Provisions Relating to IETF Documents                                   
   48    (https://trustee.ietf.org/license-info) in effect on the date of        
   49                                                                            
   50                                                                            
   51                                                                            
   52 Saint-Andre & Blanchet       Standards Track                    [Page 1]   

   53 RFC 8264                    PRECIS Framework                October 2017   
   54                                                                            
   55                                                                            
   56    publication of this document.  Please review these documents            
   57    carefully, as they describe your rights and restrictions with respect   
   58    to this document.  Code Components extracted from this document must    
   59    include Simplified BSD License text as described in Section 4.e of      
   60    the Trust Legal Provisions and are provided without warranty as         
   61    described in the Simplified BSD License.                                
   62                                                                            
   63 Table of Contents                                                          
   64                                                                            
   65    1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3   
   66    2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   6   
   67    3.  Preparation, Enforcement, and Comparison  . . . . . . . . . .   6   
   68    4.  String Classes  . . . . . . . . . . . . . . . . . . . . . . .   8   
   69      4.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   8   
   70      4.2.  IdentifierClass . . . . . . . . . . . . . . . . . . . . .   9   
   71        4.2.1.  Valid . . . . . . . . . . . . . . . . . . . . . . . .   9   
   72        4.2.2.  Contextual Rule Required  . . . . . . . . . . . . . .  10   
   73        4.2.3.  Disallowed  . . . . . . . . . . . . . . . . . . . . .  10   
   74        4.2.4.  Unassigned  . . . . . . . . . . . . . . . . . . . . .  10   
   75        4.2.5.  Examples  . . . . . . . . . . . . . . . . . . . . . .  11   
   76      4.3.  FreeformClass . . . . . . . . . . . . . . . . . . . . . .  11   
   77        4.3.1.  Valid . . . . . . . . . . . . . . . . . . . . . . . .  11   
   78        4.3.2.  Contextual Rule Required  . . . . . . . . . . . . . .  12   
   79        4.3.3.  Disallowed  . . . . . . . . . . . . . . . . . . . . .  12   
   80        4.3.4.  Unassigned  . . . . . . . . . . . . . . . . . . . . .  12   
   81        4.3.5.  Examples  . . . . . . . . . . . . . . . . . . . . . .  12   
   82      4.4.  Summary . . . . . . . . . . . . . . . . . . . . . . . . .  12   
   83    5.  Profiles  . . . . . . . . . . . . . . . . . . . . . . . . . .  14   
   84      5.1.  Profiles Must Not Be Multiplied beyond Necessity  . . . .  14   
   85      5.2.  Rules . . . . . . . . . . . . . . . . . . . . . . . . . .  15   
   86        5.2.1.  Width Mapping Rule  . . . . . . . . . . . . . . . . .  15   
   87        5.2.2.  Additional Mapping Rule . . . . . . . . . . . . . . .  15   
   88        5.2.3.  Case Mapping Rule . . . . . . . . . . . . . . . . . .  16   
   89        5.2.4.  Normalization Rule  . . . . . . . . . . . . . . . . .  16   
   90        5.2.5.  Directionality Rule . . . . . . . . . . . . . . . . .  17   
   91      5.3.  A Note about Spaces . . . . . . . . . . . . . . . . . . .  18   
   92    6.  Applications  . . . . . . . . . . . . . . . . . . . . . . . .  18   
   93      6.1.  How to Use PRECIS in Applications . . . . . . . . . . . .  18   
   94      6.2.  Further Excluded Characters . . . . . . . . . . . . . . .  20   
   95      6.3.  Building Application-Layer Constructs . . . . . . . . . .  20   
   96    7.  Order of Operations . . . . . . . . . . . . . . . . . . . . .  21   
   97    8.  Code Point Properties . . . . . . . . . . . . . . . . . . . .  21   
   98    9.  Category Definitions Used to Calculate Derived Property . . .  24   
   99      9.1.  LetterDigits (A)  . . . . . . . . . . . . . . . . . . . .  25   
  100      9.2.  Unstable (B)  . . . . . . . . . . . . . . . . . . . . . .  25   
  101      9.3.  IgnorableProperties (C) . . . . . . . . . . . . . . . . .  25   
  102      9.4.  IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . .  25   
  103      9.5.  LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . .  25   
  104                                                                            
  105                                                                            
  106                                                                            
  107 Saint-Andre & Blanchet       Standards Track                    [Page 2]   

  108 RFC 8264                    PRECIS Framework                October 2017   
  109                                                                            
  110                                                                            
  111      9.6.  Exceptions (F)  . . . . . . . . . . . . . . . . . . . . .  25   
  112      9.7.  BackwardCompatible (G)  . . . . . . . . . . . . . . . . .  25   
  113      9.8.  JoinControl (H) . . . . . . . . . . . . . . . . . . . . .  26   
  114      9.9.  OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . .  26   
  115      9.10. Unassigned (J)  . . . . . . . . . . . . . . . . . . . . .  26   
  116      9.11. ASCII7 (K)  . . . . . . . . . . . . . . . . . . . . . . .  26   
  117      9.12. Controls (L)  . . . . . . . . . . . . . . . . . . . . . .  27   
  118      9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . .  27   
  119      9.14. Spaces (N)  . . . . . . . . . . . . . . . . . . . . . . .  27   
  120      9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . .  27   
  121      9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . .  27   
  122      9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . .  28   
  123      9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . .  28   
  124    10. Guidelines for Designated Experts . . . . . . . . . . . . . .  28   
  125    11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  29   
  126      11.1.  PRECIS Derived Property Value Registry . . . . . . . . .  29   
  127      11.2.  PRECIS Base Classes Registry . . . . . . . . . . . . . .  29   
  128      11.3.  PRECIS Profiles Registry . . . . . . . . . . . . . . . .  30   
  129    12. Security Considerations . . . . . . . . . . . . . . . . . . .  32   
  130      12.1.  General Issues . . . . . . . . . . . . . . . . . . . . .  32   
  131      12.2.  Use of the IdentifierClass . . . . . . . . . . . . . . .  33   
  132      12.3.  Use of the FreeformClass . . . . . . . . . . . . . . . .  33   
  133      12.4.  Local Character Set Issues . . . . . . . . . . . . . . .  33   
  134      12.5.  Visually Similar Characters  . . . . . . . . . . . . . .  33   
  135      12.6.  Security of Passwords  . . . . . . . . . . . . . . . . .  35   
  136    13. Interoperability Considerations . . . . . . . . . . . . . . .  36   
  137      13.1.  Coded Character Sets . . . . . . . . . . . . . . . . . .  36   
  138      13.2.  Dependency on Unicode  . . . . . . . . . . . . . . . . .  37   
  139      13.3.  Encoding . . . . . . . . . . . . . . . . . . . . . . . .  37   
  140      13.4.  Unicode Versions . . . . . . . . . . . . . . . . . . . .  37   
  141      13.5.  Potential Changes to Handling of Certain Unicode Code          
  142             Points . . . . . . . . . . . . . . . . . . . . . . . . .  37   
  143    14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  38   
  144      14.1.  Normative References . . . . . . . . . . . . . . . . . .  38   
  145      14.2.  Informative References . . . . . . . . . . . . . . . . .  39   
  146    Appendix A.  Changes from RFC 7564  . . . . . . . . . . . . . . .  43   
  147    Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  43   
  148    Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  43   
  149                                                                            
  150 1.  Introduction                                                           
  151                                                                            
  152    Application protocols using Unicode code points [Unicode] in protocol   
  153    strings need to properly handle such strings in order to enforce        
  154    internationalization rules for strings placed in various protocol       
  155    slots (such as addresses and identifiers) and to perform valid          
  156    comparison operations (e.g., for purposes of authentication or          
  157    authorization).  This document defines a framework enabling             
  158    application protocols to perform the preparation, enforcement, and      
  159                                                                            
  160                                                                            
  161                                                                            
  162 Saint-Andre & Blanchet       Standards Track                    [Page 3]   

  163 RFC 8264                    PRECIS Framework                October 2017   
  164                                                                            
  165                                                                            
  166    comparison of internationalized strings ("PRECIS") in a way that        
  167    depends on the properties of Unicode code points and thus is more       
  168    agile with respect to versions of Unicode.  (Note: PRECIS is            
  169    restricted to Unicode and does not support any other coded character    
  170    set [RFC6365].)                                                         
  171                                                                            
  172    As described in the PRECIS problem statement [RFC6885], many IETF       
  173    protocols have used the Stringprep framework [RFC3454] as the basis     
  174    for preparing, enforcing, and comparing protocol strings that contain   
  175    Unicode code points, especially code points outside the ASCII range     
  176    [RFC20].  The Stringprep framework was developed during work on the     
  177    original technology for internationalized domain names (IDNs), here     
  178    called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the             
  179    Stringprep profile for IDNs.  At the time, Stringprep was designed as   
  180    a general framework so that other application protocols could define    
  181    their own Stringprep profiles.  Indeed, a number of application         
  182    protocols defined such profiles.                                        
  183                                                                            
  184    After the publication of [RFC3454] in 2002, several significant         
  185    issues arose with the use of Stringprep in the IDN case, as             
  186    documented in the IAB's recommendations regarding IDNs [RFC4690]        
  187    (most significantly, Stringprep was tied to Unicode version 3.2).       
  188    Therefore, the newer IDNA specifications, here called "IDNA2008"        
  189    [RFC5890] [RFC5891] [RFC5892] [RFC5893] [RFC5894], no longer use        
  190    Stringprep and Nameprep.  This migration away from Stringprep for       
  191    IDNs prompted other "customers" of Stringprep to consider new           
  192    approaches to the preparation, enforcement, and comparison of           
  193    internationalized strings, as described in [RFC6885].                   
  194                                                                            
  195    This document defines a framework for a post-Stringprep approach to     
  196    the preparation, enforcement, and comparison of internationalized       
  197    strings in application protocols, based on several principles:          
  198                                                                            
  199    1.  Define a small set of string classes that specify the Unicode       
  200        code points appropriate for common application-protocol             
  201        constructs (where possible, maintaining compatibility with          
  202        IDNA2008 to help ensure a more consistent user experience).         
  203                                                                            
  204    2.  Define each PRECIS string class in terms of Unicode code points     
  205        and their properties so that an algorithm can be used to            
  206        determine whether each code point or character category is          
  207        (a) valid, (b) allowed in certain contexts, (c) disallowed, or      
  208        (d) unassigned.                                                     
  209                                                                            
  210    3.  Use an "inclusion model" such that a string class consists only     
  211        of code points that are explicitly allowed, with the result that    
  212        any code point not explicitly allowed is forbidden.                 
  213                                                                            
  214                                                                            
  215                                                                            
  216                                                                            
  217 Saint-Andre & Blanchet       Standards Track                    [Page 4]   

  218 RFC 8264                    PRECIS Framework                October 2017   
  219                                                                            
  220                                                                            
  221    4.  Enable application protocols to define profiles of the PRECIS       
  222        string classes if necessary (addressing matters such as width       
  223        mapping, case mapping, Unicode normalization, and                   
  224        directionality), but strongly discourage the multiplication of      
  225        profiles beyond necessity in order to avoid violations of the       
  226        "Principle of Least Astonishment".                                  
  227                                                                            
  228    It is expected that this framework will yield the following benefits:   
  229                                                                            
  230    o  Application protocols will be more agile with regard to Unicode      
  231       versions (recognizing that complete agility cannot be realized in    
  232       practice).                                                           
  233                                                                            
  234    o  Implementers will be able to share code point tables and software    
  235       code across application protocols, most likely by means of           
  236       software libraries.                                                  
  237                                                                            
  238    o  End users will be able to acquire more accurate expectations about   
  239       the code points that are acceptable in various contexts.  Given      
  240       this more uniform set of string classes, it is also expected that    
  241       copy/paste operations between software implementing different        
  242       application protocols will be more predictable and coherent.         
  243                                                                            
  244    Whereas the string classes define the "baseline" code points for a      
  245    range of applications, profiling enables application protocols to       
  246    apply the string classes in ways that are appropriate for common        
  247    constructs such as usernames [RFC8265], opaque strings such as          
  248    passwords [RFC8265], and nicknames [RFC8266].  Profiles are             
  249    responsible for defining the handling of right-to-left code points as   
  250    well as various mapping operations of the kind also discussed for       
  251    IDNs in [RFC5895], such as case preservation or lowercasing, Unicode    
  252    normalization, mapping of certain code points to other code points or   
  253    to nothing, and mapping of fullwidth and halfwidth code points.         
  254                                                                            
  255    When an application applies a profile of a PRECIS string class, it      
  256    transforms an input string (which might or might not be conforming)     
  257    into an output string that definitively conforms to the profile.  In    
  258    particular, this document focuses on the resulting ability to achieve   
  259    the following objectives:                                               
  260                                                                            
  261    a.  Enforcing all the rules of a profile for a single output string     
  262        to check whether the output string conforms to the rules of the     
  263        profile and thus determine if a string can be included in a         
  264        protocol slot, communicated to another entity within a protocol,    
  265        stored in a retrieval system, etc.                                  
  266                                                                            
  267    b.  Comparing two output strings to determine if they are equivalent,   
  268        typically through octet-for-octet matching to test for              
  269                                                                            
  270                                                                            
  271                                                                            
  272 Saint-Andre & Blanchet       Standards Track                    [Page 5]   

  273 RFC 8264                    PRECIS Framework                October 2017   
  274                                                                            
  275                                                                            
  276        "bit-string identity" (e.g., to make an access decision for         
  277        purposes of authentication or authorization as further described    
  278        in [RFC6943]).                                                      
  279                                                                            
  280    The opportunity to define profiles naturally introduces the             
  281    possibility of a proliferation of profiles, thus potentially            
  282    mitigating the benefits of common code and violating user               
  283    expectations.  See Section 5 for a discussion of this important         
  284    topic.                                                                  
  285                                                                            
  286    In addition, it is extremely important for protocol designers and       
  287    application developers to understand that the transformation of an      
  288    input string to an output string is rarely reversible.  As one          
  289    relatively simple example, case mapping would transform an input        
  290    string of "StPeter" to an output string of "stpeter", thus leading to   
  291    a loss of information about the capitalization of the first and third   
  292    characters.  Similar considerations apply to other forms of mapping     
  293    and normalization.                                                      
  294                                                                            
  295    Although this framework is similar to IDNA2008 and includes by          
  296    reference some of the character categories defined in [RFC5892], it     
  297    defines additional character categories to meet the needs of common     
  298    application protocols other than DNS.                                   
  299                                                                            
  300    The character categories and calculation rules defined under            
  301    Sections 8 and 9 are normative and apply to all Unicode code points.    
  302    The code point table that results from applying the character           
  303    categories and calculation rules to the latest version of Unicode can   
  304    be found in an IANA registry (see Section 11).                          
  305                                                                            
  306 2.  Terminology                                                            
  307                                                                            
  308    Many important terms used in this document are defined in [RFC5890],    
  309    [RFC6365], [RFC6885], and [Unicode].  The terms "left-to-right" (LTR)   
  310    and "right-to-left" (RTL) are defined in Unicode Standard Annex #9      
  311    [UAX9].                                                                 
  312                                                                            
  313    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",     
  314    "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and    
  315    "OPTIONAL" in this document are to be interpreted as described in       
  316    BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all      
  317    capitals, as shown here.                                                
  318                                                                            
  319 3.  Preparation, Enforcement, and Comparison                               
  320                                                                            
  321    This document distinguishes between three different actions that an     
  322    entity can take with regard to a string:                                
  323                                                                            
  324                                                                            
  325                                                                            
  326                                                                            
  327 Saint-Andre & Blanchet       Standards Track                    [Page 6]   

  328 RFC 8264                    PRECIS Framework                October 2017   
  329                                                                            
  330                                                                            
  331    o  Enforcement entails applying all of the rules specified for a        
  332       particular string class, or profile thereof, to a single input       
  333       string, for the purpose of checking whether the string conforms to   
  334       all of the rules and thus determining if the string can be used in   
  335       a given protocol slot.                                               
  336                                                                            
  337    o  Comparison entails applying all of the rules specified for a         
  338       particular string class, or profile thereof, to two separate input   
  339       strings, for the purpose of determining if the two strings are       
  340       equivalent.                                                          
  341                                                                            
  342    o  Preparation primarily entails ensuring that the code points in a     
  343       single input string are allowed by the underlying PRECIS string      
  344       class, and sometimes also entails applying one or more of the        
  345       rules specified for a particular string class or profile thereof.    
  346       Preparation can be appropriate for constrained devices that can to   
  347       some extent restrict the code points in a string to a limited        
  348       repertoire of characters but that do not have the processing power   
  349       or onboard memory to perform operations such as Unicode              
  350       normalization.  However, preparation does not ensure that an input   
  351       string conforms to all of the rules for a string class or profile    
  352       thereof.                                                             
  353                                                                            
  354          Note: The term "preparation" as used in this specification and    
  355          related documents has a much more limited scope than it did in    
  356          Stringprep; it essentially refers to a kind of preprocessing of   
  357          an input string, not the actual operations that apply             
  358          internationalization rules to produce an output string (here      
  359          termed "enforcement") or to compare two output strings (here      
  360          termed "comparison").                                             
  361                                                                            
  362    In most cases, authoritative entities such as servers are responsible   
  363    for enforcement, whereas subsidiary entities such as clients are        
  364    responsible only for preparation.  The rationale for this distinction   
  365    is that clients might not have the facilities (in terms of device       
  366    memory and processing power) to enforce all the rules regarding         
  367    internationalized strings (such as width mapping and Unicode            
  368    normalization), although they can more easily limit the repertoire of   
  369    characters they offer to an end user.  By contrast, it is assumed       
  370    that a server would have more capacity to enforce the rules, and in     
  371    any case a server acts as an authority regarding allowable strings in   
  372    protocol slots such as addresses and endpoint identifiers.  In          
  373    addition, a client cannot necessarily be trusted to properly generate   
  374    such strings, especially for security-sensitive contexts such as        
  375    authentication and authorization.                                       
  376                                                                            
  377                                                                            
  378                                                                            
  379                                                                            
  380                                                                            
  381                                                                            
  382 Saint-Andre & Blanchet       Standards Track                    [Page 7]   

  383 RFC 8264                    PRECIS Framework                October 2017   
  384                                                                            
  385                                                                            
  386 4.  String Classes                                                         
  387                                                                            
  388 4.1.  Overview                                                             
  389                                                                            
  390    Starting in 2010, various "customers" of Stringprep began to discuss    
  391    the need to define a post-Stringprep approach to the preparation and    
  392    comparison of internationalized strings other than IDNs.  This          
  393    community analyzed the existing Stringprep profiles and also weighed    
  394    the costs and benefits of defining a relatively small set of Unicode    
  395    code points that would minimize the potential for user confusion        
  396    caused by visually similar code points (and thus be relatively          
  397    "safe") vs. defining a much larger set of Unicode code points that      
  398    would maximize the potential for user creativity (and thus be           
  399    relatively "expressive").  As a result, the community concluded that    
  400    most existing uses could be addressed by two string classes:            
  401                                                                            
  402    IdentifierClass:  a sequence of letters, numbers, and some symbols      
  403       that is used to identify or address a network entity such as a       
  404       user account, a venue (e.g., a chat room), an information source     
  405       (e.g., a data feed), or a collection of data (e.g., a file); the     
  406       intent is that this class will minimize user confusion in a wide     
  407       variety of application protocols, with the result that safety has    
  408       been prioritized over expressiveness for this class.                 
  409                                                                            
  410    FreeformClass:  a sequence of letters, numbers, symbols, spaces, and    
  411       other code points that is used for free-form strings, including      
  412       passwords as well as display elements such as human-friendly         
  413       nicknames for devices or for participants in a chat room; the        
  414       intent is that this class will allow nearly any Unicode code         
  415       point, with the result that expressiveness has been prioritized      
  416       over safety for this class.  Note well that protocol designers,      
  417       application developers, service providers, and end users might not   
  418       understand or be able to enter all of the code points that can be    
  419       included in the FreeformClass (see Section 12.3 for details).        
  420                                                                            
  421    Future specifications might define additional PRECIS string classes,    
  422    such as a class that falls somewhere between the IdentifierClass and    
  423    the FreeformClass.  At this time, it is not clear how useful such a     
  424    class would be.  In any case, because application developers are able   
  425    to define profiles of PRECIS string classes, a protocol needing a       
  426    construct between the IdentifierClass and the FreeformClass could       
  427    define a restricted profile of the FreeformClass if needed.             
  428                                                                            
  429    The following subsections discuss the IdentifierClass and               
  430    FreeformClass in more detail, with reference to the dimensions          
  431    described in Section 5 of [RFC6885].  Each string class is defined by   
  432    the following behavioral rules:                                         
  433                                                                            
  434                                                                            
  435                                                                            
  436                                                                            
  437 Saint-Andre & Blanchet       Standards Track                    [Page 8]   

  438 RFC 8264                    PRECIS Framework                October 2017   
  439                                                                            
  440                                                                            
  441    Valid:  Defines which code points are treated as valid for the          
  442       string.                                                              
  443                                                                            
  444    Contextual Rule Required:  Defines which code points are treated as     
  445       allowed only if the requirements of a contextual rule are met        
  446       (i.e., either CONTEXTJ or CONTEXTO as originally defined in the      
  447       IDNA2008 specifications).                                            
  448                                                                            
  449    Disallowed:  Defines which code points need to be excluded from the     
  450       string.                                                              
  451                                                                            
  452    Unassigned:  Defines application behavior in the presence of code       
  453       points that are unknown (i.e., not yet designated) for the version   
  454       of Unicode used by the application.                                  
  455                                                                            
  456    This document defines the valid, contextual rule required,              
  457    disallowed, and unassigned rules for the IdentifierClass and            
  458    FreeformClass.  As described under Section 5, profiles of these         
  459    string classes are responsible for defining the width mapping,          
  460    additional mapping, case mapping, normalization, and directionality     
  461    rules.                                                                  
  462                                                                            
  463 4.2.  IdentifierClass                                                      
  464                                                                            
  465    Most application technologies need strings that can be used to refer    
  466    to, include, or communicate protocol strings like usernames,            
  467    filenames, data feed identifiers, and chat room names.  We group such   
  468    strings into a class called "IdentifierClass" having the following      
  469    features.                                                               
  470                                                                            
  471 4.2.1.  Valid                                                              
  472                                                                            
  473    o  Code points traditionally used as letters and numbers in writing     
  474       systems, i.e., the LetterDigits ("A") category first defined in      
  475       [RFC5892] and listed here under Section 9.1.                         
  476                                                                            
  477    o  Code points in the range U+0021 through U+007E, i.e., the            
  478       (printable) ASCII7 ("K") category defined under Section 9.11.        
  479       These code points are "grandfathered" into PRECIS and thus are       
  480       valid even if they would otherwise be disallowed according to the    
  481       property-based rules specified in the next section.                  
  482                                                                            
  483       Note: Although the PRECIS IdentifierClass reuses the LetterDigits    
  484       category from IDNA2008, the range of code points allowed in the      
  485       IdentifierClass is wider than the range of code points allowed in    
  486       IDNA2008.  The main reason is that IDNA2008 applies the              
  487       Unstable ("B") category (Section 9.2) before the LetterDigits        
  488                                                                            
  489                                                                            
  490                                                                            
  491                                                                            
  492 Saint-Andre & Blanchet       Standards Track                    [Page 9]   

  493 RFC 8264                    PRECIS Framework                October 2017   
  494                                                                            
  495                                                                            
  496       category, thus disallowing uppercase code points, whereas the        
  497       IdentifierClass does not apply the Unstable category.                
  498                                                                            
  499 4.2.2.  Contextual Rule Required                                           
  500                                                                            
  501    o  A number of code points from the Exceptions ("F") category defined   
  502       under Section 9.6.                                                   
  503                                                                            
  504    o  Joining code points, i.e., the JoinControl ("H") category defined    
  505       under Section 9.8.                                                   
  506                                                                            
  507 4.2.3.  Disallowed                                                         
  508                                                                            
  509    o  Old Hangul Jamo code points, i.e., the OldHangulJamo ("I")           
  510       category defined under Section 9.9.                                  
  511                                                                            
  512    o  Control code points, i.e., the Controls ("L") category defined       
  513       under Section 9.12.                                                  
  514                                                                            
  515    o  Ignorable code points, i.e., the PrecisIgnorableProperties ("M")     
  516       category defined under Section 9.13.                                 
  517                                                                            
  518    o  Space code points, i.e., the Spaces ("N") category defined under     
  519       Section 9.14.                                                        
  520                                                                            
  521    o  Symbol code points, i.e., the Symbols ("O") category defined under   
  522       Section 9.15.                                                        
  523                                                                            
  524    o  Punctuation code points, i.e., the Punctuation ("P") category        
  525       defined under Section 9.16.                                          
  526                                                                            
  527    o  Any code point that is decomposed and recomposed into something      
  528       other than itself under Unicode Normalization Form KC, i.e., the     
  529       HasCompat ("Q") category defined under Section 9.17.  These code     
  530       points are disallowed even if they would otherwise be valid          
  531       according to the property-based rules specified in the previous      
  532       section.                                                             
  533                                                                            
  534    o  Letters and digits other than the "traditional" letters and digits   
  535       allowed in IDNs, i.e., the OtherLetterDigits ("R") category          
  536       defined under Section 9.18.                                          
  537                                                                            
  538 4.2.4.  Unassigned                                                         
  539                                                                            
  540    Any code points that are not yet designated in the Unicode coded        
  541    character set are considered unassigned for purposes of the             
  542    IdentifierClass, and such code points are to be treated as              
  543    disallowed.  See Section 9.10.                                          
  544                                                                            
  545                                                                            
  546                                                                            
  547 Saint-Andre & Blanchet       Standards Track                   [Page 10]   

  548 RFC 8264                    PRECIS Framework                October 2017   
  549                                                                            
  550                                                                            
  551 4.2.5.  Examples                                                           
  552                                                                            
  553    As described in the Introduction to this document, the string classes   
  554    do not handle all issues related to string preparation and comparison   
  555    (such as case mapping); instead, such issues are handled at the level   
  556    of profiles.  Examples for profiles of the IdentifierClass can be       
  557    found in [RFC8265] (the UsernameCaseMapped and UsernameCasePreserved    
  558    profiles).                                                              
  559                                                                            
  560 4.3.  FreeformClass                                                        
  561                                                                            
  562    Some application technologies need strings that can be used in a        
  563    free-form way, e.g., as a password in an authentication exchange (see   
  564    [RFC8265]) or a nickname in a chat room (see [RFC8266]).  We group      
  565    such things into a class called "FreeformClass" having the following    
  566    features.                                                               
  567                                                                            
  568       Security Warning: As mentioned, the FreeformClass prioritizes        
  569       expressiveness over safety; Section 12.3 describes some of the       
  570       security hazards involved with using or profiling the                
  571       FreeformClass.                                                       
  572                                                                            
  573       Security Warning: Consult Section 12.6 for relevant security         
  574       considerations when strings conforming to the FreeformClass, or a    
  575       profile thereof, are used as passwords.                              
  576                                                                            
  577 4.3.1.  Valid                                                              
  578                                                                            
  579    o  Traditional letters and numbers, i.e., the LetterDigits ("A")        
  580       category first defined in [RFC5892] and listed here under            
  581       Section 9.1.                                                         
  582                                                                            
  583    o  Code points in the range U+0021 through U+007E, i.e., the            
  584       (printable) ASCII7 ("K") category defined under Section 9.11.        
  585                                                                            
  586    o  Space code points, i.e., the Spaces ("N") category defined under     
  587       Section 9.14.                                                        
  588                                                                            
  589    o  Symbol code points, i.e., the Symbols ("O") category defined under   
  590       Section 9.15.                                                        
  591                                                                            
  592    o  Punctuation code points, i.e., the Punctuation ("P") category        
  593       defined under Section 9.16.                                          
  594                                                                            
  595    o  Any code point that is decomposed and recomposed into something      
  596       other than itself under Unicode Normalization Form KC, i.e., the     
  597       HasCompat ("Q") category defined under Section 9.17.                 
  598                                                                            
  599                                                                            
  600                                                                            
  601                                                                            
  602 Saint-Andre & Blanchet       Standards Track                   [Page 11]   

  603 RFC 8264                    PRECIS Framework                October 2017   
  604                                                                            
  605                                                                            
  606    o  Letters and digits other than the "traditional" letters and digits   
  607       allowed in IDNs, i.e., the OtherLetterDigits ("R") category          
  608       defined under Section 9.18.                                          
  609                                                                            
  610 4.3.2.  Contextual Rule Required                                           
  611                                                                            
  612    o  A number of code points from the Exceptions ("F") category defined   
  613       under Section 9.6.                                                   
  614                                                                            
  615    o  Joining code points, i.e., the JoinControl ("H") category defined    
  616       under Section 9.8.                                                   
  617                                                                            
  618 4.3.3.  Disallowed                                                         
  619                                                                            
  620    o  Old Hangul Jamo code points, i.e., the OldHangulJamo ("I")           
  621       category defined under Section 9.9.                                  
  622                                                                            
  623    o  Control code points, i.e., the Controls ("L") category defined       
  624       under Section 9.12.                                                  
  625                                                                            
  626    o  Ignorable code points, i.e., the PrecisIgnorableProperties ("M")     
  627       category defined under Section 9.13.                                 
  628                                                                            
  629 4.3.4.  Unassigned                                                         
  630                                                                            
  631    Any code points that are not yet designated in the Unicode coded        
  632    character set are considered unassigned for purposes of the             
  633    FreeformClass, and such code points are to be treated as disallowed.    
  634                                                                            
  635 4.3.5.  Examples                                                           
  636                                                                            
  637    As described in the Introduction to this document, the string classes   
  638    do not handle all issues related to string preparation and comparison   
  639    (such as case mapping); instead, such issues are handled at the level   
  640    of profiles.  Examples for profiles of the FreeformClass can be found   
  641    in [RFC8265] (the OpaqueString profile) and [RFC8266] (the Nickname     
  642    profile).                                                               
  643                                                                            
  644 4.4.  Summary                                                              
  645                                                                            
  646    The following table summarizes the differences between the              
  647    IdentifierClass and the FreeformClass (i.e., the disposition of a       
  648    code point as valid, contextual rule required, disallowed, or           
  649    unassigned), depending on its PRECIS category.                          
  650                                                                            
  651                                                                            
  652                                                                            
  653                                                                            
  654                                                                            
  655                                                                            
  656                                                                            
  657 Saint-Andre & Blanchet       Standards Track                   [Page 12]   

  658 RFC 8264                    PRECIS Framework                October 2017   
  659                                                                            
  660                                                                            
  661     +===============================+=================+===============+    
  662     |        CATEGORY               | IDENTIFIERCLASS | FREEFORMCLASS |    
  663     +===============================+=================+===============+    
  664     | (A) LetterDigits              | Valid           | Valid         |    
  665     +-------------------------------+-----------------+---------------+    
  666     | (B) Unstable                  |          [N/A (unused)]         |    
  667     +-------------------------------+-----------------+---------------+    
  668     | (C) IgnorableProperties       |          [N/A (unused)]         |    
  669     +-------------------------------+-----------------+---------------+    
  670     | (D) IgnorableBlocks           |          [N/A (unused)]         |    
  671     +-------------------------------+-----------------+---------------+    
  672     | (E) LDH                       |          [N/A (unused)]         |    
  673     +-------------------------------+-----------------+---------------+    
  674     | (F) Exceptions                | Contextual      | Contextual    |    
  675     |                               | Rule Required   | Rule Required |    
  676     +-------------------------------+-----------------+---------------+    
  677     | (G) BackwardCompatible        |      [Handled by IDNA Rules]    |    
  678     +-------------------------------+-----------------+---------------+    
  679     | (H) JoinControl               | Contextual      | Contextual    |    
  680     |                               | Rule Required   | Rule Required |    
  681     +-------------------------------+-----------------+---------------+    
  682     | (I) OldHangulJamo             | Disallowed      | Disallowed    |    
  683     +-------------------------------+-----------------+---------------+    
  684     | (J) Unassigned                | Unassigned      | Unassigned    |    
  685     +-------------------------------+-----------------+---------------+    
  686     | (K) ASCII7                    | Valid           | Valid         |    
  687     +-------------------------------+-----------------+---------------+    
  688     | (L) Controls                  | Disallowed      | Disallowed    |    
  689     +-------------------------------+-----------------+---------------+    
  690     | (M) PrecisIgnorableProperties | Disallowed      | Disallowed    |    
  691     +-------------------------------+-----------------+---------------+    
  692     | (N) Spaces                    | Disallowed      | Valid         |    
  693     +-------------------------------+-----------------+---------------+    
  694     | (O) Symbols                   | Disallowed      | Valid         |    
  695     +-------------------------------+-----------------+---------------+    
  696     | (P) Punctuation               | Disallowed      | Valid         |    
  697     +-------------------------------+-----------------+---------------+    
  698     | (Q) HasCompat                 | Disallowed      | Valid         |    
  699     +-------------------------------+-----------------+---------------+    
  700     | (R) OtherLetterDigits         | Disallowed      | Valid         |    
  701     +-------------------------------+-----------------+---------------+    
  702                                                                            
  703               Table 1: Comparative Disposition of Code Points              
  704                                                                            
  705                                                                            
  706                                                                            
  707                                                                            
  708                                                                            
  709                                                                            
  710                                                                            
  711                                                                            
  712 Saint-Andre & Blanchet       Standards Track                   [Page 13]   

  713 RFC 8264                    PRECIS Framework                October 2017   
  714                                                                            
  715                                                                            
  716 5.  Profiles                                                               
  717                                                                            
  718    This framework document defines the valid, contextual rule required,    
  719    disallowed, and unassigned rules for the IdentifierClass and the        
  720    FreeformClass.  A profile of a PRECIS string class MUST define the      
  721    width mapping, additional mapping (if any), case mapping,               
  722    normalization, and directionality rules.  A profile MAY also restrict   
  723    the allowable code points above and beyond the definition of the        
  724    relevant PRECIS string class (but MUST NOT add as valid any code        
  725    points that are disallowed by the relevant PRECIS string class).        
  726    These matters are discussed in the following subsections.               
  727                                                                            
  728    Profiles of the PRECIS string classes are registered with the IANA as   
  729    described under Section 11.3.  Profile names use the following          
  730    convention: they are of the form "Profilename of BaseClass", where      
  731    the "Profilename" string is a differentiator and "BaseClass" is the     
  732    name of the PRECIS string class being profiled; for example, the        
  733    profile used for opaque strings such as passwords is the OpaqueString   
  734    profile of the FreeformClass [RFC8265].                                 
  735                                                                            
  736 5.1.  Profiles Must Not Be Multiplied beyond Necessity                     
  737                                                                            
  738    The risk of profile proliferation is significant because having too     
  739    many profiles will result in different behavior across various          
  740    applications, thus violating what is known in user interface design     
  741    as the "Principle of Least Astonishment".                               
  742                                                                            
  743    Indeed, we already have too many profiles.  Ideally, we would have at   
  744    most two or three profiles.  Unfortunately, numerous application        
  745    protocols exist with their own quirks regarding protocol strings.       
  746    Domain names, email addresses, instant messaging addresses, chat room   
  747    names, user nicknames or display names, filenames, authentication       
  748    identifiers, passwords, and other strings already exist in the wild     
  749    and need to be supported in existing application protocols such as      
  750    DNS, SMTP, the Extensible Messaging and Presence Protocol (XMPP),       
  751    Internet Relay Chat (IRC), NFS, the Internet Small Computer System      
  752    Interface (iSCSI), the Extensible Authentication Protocol (EAP), and    
  753    the Simple Authentication and Security Layer (SASL) [RFC4422], among    
  754    others.                                                                 
  755                                                                            
  756    Nevertheless, profiles must not be multiplied beyond necessity.         
  757                                                                            
  758    To help prevent profile proliferation, this document recommends         
  759    sensible defaults for the various options offered to profile creators   
  760    (such as width mapping and Unicode normalization).  In addition, the    
  761    guidelines for designated experts provided under Section 10 are meant   
  762    to encourage a high level of due diligence regarding new profiles.      
  763                                                                            
  764                                                                            
  765                                                                            
  766                                                                            
  767 Saint-Andre & Blanchet       Standards Track                   [Page 14]   

  768 RFC 8264                    PRECIS Framework                October 2017   
  769                                                                            
  770                                                                            
  771 5.2.  Rules                                                                
  772                                                                            
  773 5.2.1.  Width Mapping Rule                                                 
  774                                                                            
  775    The width mapping rule of a profile specifies whether width mapping     
  776    is performed on a string and how the mapping is done.  Typically,       
  777    such mapping consists of mapping fullwidth and halfwidth code points,   
  778    i.e., code points with a Decomposition Type of Wide or Narrow, to       
  779    their decomposition mappings; as an example, "０" (FULLWIDTH DIGIT       
  780    ZERO, U+FF10) would be mapped to "0" (DIGIT ZERO U+0030).               
  781                                                                            
  782    The normalization form specified by a profile (see below) has an        
  783    impact on the need for width mapping.  Because width mapping is         
  784    performed as a part of compatibility decomposition, a profile           
  785    employing either Normalization Form KD (NFKD) or Normalization          
  786    Form KC (NFKC) does not need to specify width mapping.  However, if     
  787    Unicode Normalization Form C (NFC) is used (as is recommended), then    
  788    the profile needs to specify whether to apply width mapping; in this    
  789    case, width mapping is in general RECOMMENDED because allowing          
  790    fullwidth and halfwidth code points to remain unmapped to their         
  791    compatibility variants would violate the "Principle of Least            
  792    Astonishment".  For more information about the concept of width in      
  793    East Asian scripts within Unicode, see Unicode Standard Annex #11       
  794    [UAX11].                                                                
  795                                                                            
  796       Note: Because the East Asian width property is not guaranteed to     
  797       be stable by the Unicode Standard (see                               
  798       <http://unicode.org/policies/stability_policy.html> for details),    
  799       the results of applying a given width mapping rule might not be      
  800       consistent across different versions of Unicode.                     
  801                                                                            
  802 5.2.2.  Additional Mapping Rule                                            
  803                                                                            
  804    The additional mapping rule of a profile specifies whether additional   
  805    mappings are performed on a string, such as:                            
  806                                                                            
  807    o  Mapping of delimiter code points (such as '@', ':', '/', '+',        
  808       and '-').                                                            
  809                                                                            
  810    o  Mapping of special code points (e.g., non-ASCII space code points    
  811       to SPACE (U+0020) or control code points to nothing).                
  812                                                                            
  813    The PRECIS mappings document [RFC7790] describes such mappings in       
  814    more detail.                                                            
  815                                                                            
  816                                                                            
  817                                                                            
  818                                                                            
  819                                                                            
  820                                                                            
  821                                                                            
  822 Saint-Andre & Blanchet       Standards Track                   [Page 15]   

  823 RFC 8264                    PRECIS Framework                October 2017   
  824                                                                            
  825                                                                            
  826 5.2.3.  Case Mapping Rule                                                  
  827                                                                            
  828    The case mapping rule of a profile specifies whether case mapping       
  829    (instead of case preservation) is performed on a string and how the     
  830    mapping is applied (e.g., mapping uppercase and titlecase code points   
  831    to their lowercase equivalents).                                        
  832                                                                            
  833    If case mapping is desired (instead of case preservation), it is        
  834    RECOMMENDED to use the Unicode toLowerCase() operation defined in the   
  835    Unicode Standard [Unicode].  In contrast to the Unicode toCaseFold()    
  836    operation, the toLowerCase() operation is less likely to violate the    
  837    "Principle of Least Astonishment", especially when an application       
  838    merely wishes to convert uppercase and titlecase code points to their   
  839    lowercase equivalents while preserving lowercase code points.           
  840    Although the toCaseFold() operation can be appropriate when an          
  841    application needs to compare two strings (such as in search             
  842    operations), in general few application developers and even fewer       
  843    users understand its implications, so toLowerCase() is almost always    
  844    the safer choice.                                                       
  845                                                                            
  846       Note: Neither toLowerCase() nor toCaseFold() is designed to handle   
  847       various language-specific issues, such as the character "ı" (LATIN   
  848       SMALL LETTER DOTLESS I, U+0131) in several Turkic languages.  The    
  849       reader is referred to the PRECIS mappings document [RFC7790],        
  850       which describes these issues in greater detail.                      
  851                                                                            
  852    In order to maximize entropy and minimize the potential for false       
  853    accepts, it is NOT RECOMMENDED for application protocols to map         
  854    uppercase and titlecase code points to their lowercase equivalents      
  855    when strings conforming to the FreeformClass, or a profile thereof,     
  856    are used in passwords; instead, it is RECOMMENDED to preserve the       
  857    case of all code points contained in such strings and then perform      
  858    case-sensitive comparison.  See also the related discussion in          
  859    Section 12.6 of this document and in [RFC8265].                         
  860                                                                            
  861 5.2.4.  Normalization Rule                                                 
  862                                                                            
  863    The normalization rule of a profile specifies which Unicode             
  864    Normalization Form (D, KD, C, or KC) is to be applied (see Unicode      
  865    Standard Annex #15 [UAX15] for background information).                 
  866                                                                            
  867    In accordance with [RFC5198], Normalization Form C (NFC) is             
  868    RECOMMENDED.                                                            
  869                                                                            
  870    Protocol designers and application developers need to understand that   
  871    certain Unicode normalization forms, especially NFKC and NFKD, can      
  872    result in significant loss of information in various circumstances      
  873    and that these circumstances can depend on the language and script of   
  874                                                                            
  875                                                                            
  876                                                                            
  877 Saint-Andre & Blanchet       Standards Track                   [Page 16]   

  878 RFC 8264                    PRECIS Framework                October 2017   
  879                                                                            
  880                                                                            
  881    the strings to which the normalization forms are applied.  Extreme      
  882    care should be taken when specifying the use of these normalization     
  883    forms.                                                                  
  884                                                                            
  885 5.2.5.  Directionality Rule                                                
  886                                                                            
  887    The directionality rule of a profile specifies how to treat strings     
  888    containing what are often called "right-to-left" (RTL) code points      
  889    (see Unicode Standard Annex #9 [UAX9]).  RTL code points come from      
  890    scripts that are normally written from right to left and are            
  891    considered by Unicode to, themselves, have right-to-left                
  892    directionality.  Some strings containing RTL code points also contain   
  893    "left-to-right" (LTR) code points, such as ASCII numerals, as well as   
  894    code points without directional properties.  Consequently, such         
  895    strings are known as "bidirectional strings".                           
  896                                                                            
  897    Presenting bidirectional strings in different layout systems (e.g., a   
  898    user interface that is configured to handle primarily an RTL script     
  899    vs. an interface that is configured to handle primarily an LTR          
  900    script) can yield display results that, while predictable to those      
  901    who understand the display rules, are counterintuitive to casual        
  902    users.  In particular, the same bidirectional string (in PRECIS         
  903    terms) might not be presented in the same way to users of those         
  904    different layout systems, even though the presentation is consistent    
  905    within any particular layout system.  In some applications, these       
  906    presentation differences might be considered problematic and thus the   
  907    application designers might wish to restrict the use of bidirectional   
  908    strings by specifying a directionality rule.  In other applications,    
  909    these presentation differences might not be considered problematic      
  910    (this especially tends to be true of more "free-form" strings) and      
  911    thus no directionality rule is needed.                                  
  912                                                                            
  913    The PRECIS framework does not directly address how to deal with         
  914    bidirectional strings across all string classes and profiles nor does   
  915    it define any new directionality rules, because at present there is     
  916    no widely accepted and implemented solution for the safe display of     
  917    arbitrary bidirectional strings beyond the Unicode bidirectional        
  918    algorithm [UAX9].  Although rules for management and display of         
  919    bidirectional strings have been defined for domain name labels and      
  920    similar identifiers through the "Bidi Rule" specified in the IDNA2008   
  921    specification on right-to-left scripts [RFC5893], those rules are       
  922    quite restrictive and are not necessarily applicable to all             
  923    bidirectional strings.                                                  
  924                                                                            
  925    The authors of a PRECIS profile might believe that they need to         
  926    define a new directionality rule of their own.  Because of the          
  927    complexity of the issues involved, such a belief is almost always       
  928    misguided, even if the authors have done a great deal of careful        
  929                                                                            
  930                                                                            
  931                                                                            
  932 Saint-Andre & Blanchet       Standards Track                   [Page 17]   

  933 RFC 8264                    PRECIS Framework                October 2017   
  934                                                                            
  935                                                                            
  936    research into the challenges of displaying bidirectional strings.       
  937    This document strongly suggests that profile authors who are thinking   
  938    about defining a new directionality rule should think again and         
  939    instead consider using the "Bidi Rule" [RFC5893] (for profiles based    
  940    on the IdentifierClass) or following the Unicode bidirectional          
  941    algorithm [UAX9] (for profiles based on the FreeformClass or in         
  942    situations where the IdentifierClass is not appropriate).               
  943                                                                            
  944 5.3.  A Note about Spaces                                                  
  945                                                                            
  946    With regard to the IdentifierClass, the consensus of the PRECIS         
  947    Working Group was that spaces are problematic for many reasons,         
  948    including the following:                                                
  949                                                                            
  950    o  Many Unicode code points are confusable with SPACE (U+0020).         
  951                                                                            
  952    o  Even if non-ASCII space code points are mapped to SPACE (U+0020),    
  953       space code points are often not rendered in user interfaces,         
  954       leading to the possibility that a human user might consider a        
  955       string containing spaces to be equivalent to the same string         
  956       without spaces.                                                      
  957                                                                            
  958    o  In some locales, some devices are known to generate a code point     
  959       other than SPACE (U+0020), such as ZERO WIDTH JOINER (U+200D),       
  960       when a user performs an action like pressing the space bar on a      
  961       keyboard.                                                            
  962                                                                            
  963    One consequence of disallowing space code points in the                 
  964    IdentifierClass might be to effectively discourage their use within     
  965    identifiers created in newer application protocols; given the           
  966    challenges involved with properly handling space code points            
  967    (especially non-ASCII space code points) in identifiers and other       
  968    protocol strings, the PRECIS Working Group considered this to be a      
  969    feature, not a bug.                                                     
  970                                                                            
  971    However, the FreeformClass does allow spaces; this in turn enables      
  972    application protocols to define profiles of the FreeformClass that      
  973    are more flexible than any profiles of the IdentifierClass.  In         
  974    addition, as explained in Section 6.3, application protocols can also   
  975    define application-layer constructs containing spaces.                  
  976                                                                            
  977 6.  Applications                                                           
  978                                                                            
  979 6.1.  How to Use PRECIS in Applications                                    
  980                                                                            
  981    Although PRECIS has been designed with applications in mind,            
  982    internationalization is not suddenly made easy through the use of       
  983    PRECIS.  Indeed, because it is extremely difficult for protocol         
  984                                                                            
  985                                                                            
  986                                                                            
  987 Saint-Andre & Blanchet       Standards Track                   [Page 18]   

  988 RFC 8264                    PRECIS Framework                October 2017   
  989                                                                            
  990                                                                            
  991    designers and application developers to do the right thing for all      
  992    users when supporting internationalized strings, often the safest       
  993    option is to support only the ASCII range [RFC20] in various protocol   
  994    slots.  This state of affairs is unfortunate but is the direct result   
  995    of the complexities involved with human languages (e.g., the vast       
  996    number of code points, scripts, user communities, and rules with        
  997    their inevitable exceptions), which kinds of strings application        
  998    developers and their users wish to support, the wide range of devices   
  999    that users employ to access services enabled by various Internet        
 1000    protocols, and so on.                                                   
 1001                                                                            
 1002    Despite these significant challenges, application and protocol          
 1003    developers sometimes persevere in attempting to support                 
 1004    internationalized strings in their systems.  These developers need to   
 1005    think carefully about how they will use the PRECIS string classes, or   
 1006    profiles thereof, in their applications.  This section provides some    
 1007    guidelines to application developers (and to expert reviewers of        
 1008    application-protocol specifications).                                   
 1009                                                                            
 1010    o  Don't define your own profile unless absolutely necessary (see       
 1011       Section 5.1).  Existing profiles have been designed for wide         
 1012       reuse.  It is highly likely that an existing profile will meet       
 1013       your needs, especially given the ability to specify further          
 1014       excluded code points (Section 6.2) and to build application-layer    
 1015       constructs (see Section 6.3).                                        
 1016                                                                            
 1017    o  Do specify:                                                          
 1018                                                                            
 1019       *  Exactly which entities are responsible for preparation,           
 1020          enforcement, and comparison of internationalized strings (e.g.,   
 1021          servers or clients).                                              
 1022                                                                            
 1023       *  Exactly when those entities need to complete their tasks (e.g.,   
 1024          a server might need to enforce the rules of a profile before      
 1025          allowing a client to gain network access).                        
 1026                                                                            
 1027       *  Exactly which protocol slots need to be checked against which     
 1028          profiles (e.g., checking the address of a message's intended      
 1029          recipient against the UsernameCaseMapped profile [RFC8265] of     
 1030          the IdentifierClass or checking the password of a user against    
 1031          the OpaqueString profile [RFC8265] of the FreeformClass).         
 1032                                                                            
 1033       See [RFC8265] and [RFC7622] for definitions of these matters for     
 1034       several applications.                                                
 1035                                                                            
 1036                                                                            
 1037                                                                            
 1038                                                                            
 1039                                                                            
 1040                                                                            
 1041                                                                            
 1042 Saint-Andre & Blanchet       Standards Track                   [Page 19]   

 1043 RFC 8264                    PRECIS Framework                October 2017   
 1044                                                                            
 1045                                                                            
 1046 6.2.  Further Excluded Characters                                          
 1047                                                                            
 1048    An application protocol that uses a profile MAY specify particular      
 1049    code points that are not allowed in relevant slots within that          
 1050    application protocol, above and beyond those excluded by the string     
 1051    class or profile.                                                       
 1052                                                                            
 1053    That is, an application protocol MAY do either of the following:        
 1054                                                                            
 1055    1.  Exclude specific code points that are allowed by the relevant       
 1056        string class.                                                       
 1057                                                                            
 1058    2.  Exclude code points matching certain Unicode properties (e.g.,      
 1059        math symbols) that are included in the relevant PRECIS string       
 1060        class.                                                              
 1061                                                                            
 1062    As a result of such exclusions, code points that are defined as valid   
 1063    for the PRECIS string class or profile will be defined as disallowed    
 1064    for the relevant protocol slot.                                         
 1065                                                                            
 1066    Typically, such exclusions are defined for the purpose of backward      
 1067    compatibility with legacy formats within an application protocol.       
 1068    These are defined for application protocols, not profiles, in order     
 1069    to prevent multiplication of profiles beyond necessity (see             
 1070    Section 5.1).                                                           
 1071                                                                            
 1072 6.3.  Building Application-Layer Constructs                                
 1073                                                                            
 1074    Sometimes, an application-layer construct does not map in a             
 1075    straightforward manner to one of the PRECIS string classes or a         
 1076    profile thereof.  Consider, for example, the "simple username"          
 1077    construct in SASL [RFC4422].  Depending on the deployment, a simple     
 1078    username might take the form of a user's full name (e.g., the user's    
 1079    personal name followed by a space and then the user's family name).     
 1080    Such a simple username cannot be defined as an instance of the          
 1081    IdentifierClass or a profile thereof, because space code points are     
 1082    not allowed in the IdentifierClass; however, it could be defined        
 1083    using a space-separated sequence of IdentifierClass instances, as in    
 1084    the following ABNF [RFC5234] from [RFC8265]:                            
 1085                                                                            
 1086       username   = userpart *(1*SP userpart)                               
 1087       userpart   = 1*(idpoint)                                             
 1088                    ;                                                       
 1089                    ; an "idpoint" is a Unicode code point that             
 1090                    ; can be contained in a string conforming to            
 1091                    ; the PRECIS IdentifierClass                            
 1092                    ;                                                       
 1093                                                                            
 1094                                                                            
 1095                                                                            
 1096                                                                            
 1097 Saint-Andre & Blanchet       Standards Track                   [Page 20]   

 1098 RFC 8264                    PRECIS Framework                October 2017   
 1099                                                                            
 1100                                                                            
 1101    Similar techniques could be used to define many application-layer       
 1102    constructs, say of the form "user@domain" or "/path/to/file".           
 1103                                                                            
 1104 7.  Order of Operations                                                    
 1105                                                                            
 1106    To ensure proper comparison, the rules specified for a particular       
 1107    string class or profile MUST be applied in the following order:         
 1108                                                                            
 1109    1.  Width Mapping Rule                                                  
 1110                                                                            
 1111    2.  Additional Mapping Rule                                             
 1112                                                                            
 1113    3.  Case Mapping Rule                                                   
 1114                                                                            
 1115    4.  Normalization Rule                                                  
 1116                                                                            
 1117    5.  Directionality Rule                                                 
 1118                                                                            
 1119    6.  Behavioral rules for determining whether a code point is valid,     
 1120        allowed under a contextual rule, disallowed, or unassigned          
 1121                                                                            
 1122    As already described, the width mapping, additional mapping, case       
 1123    mapping, normalization, and directionality rules are specified for      
 1124    each profile, whereas the behavioral rules are specified for each       
 1125    string class.  Some of the logic behind this order is provided under    
 1126    Section 5.2.1 (see also the PRECIS mappings document [RFC7790]).  In    
 1127    addition, this order is consistent with IDNA2008, and with both         
 1128    IDNA2003 and Stringprep before then, for the purpose of enabling code   
 1129    reuse and of ensuring as much continuity as possible with the           
 1130    Stringprep profiles that are obsoleted by several PRECIS profiles.      
 1131                                                                            
 1132    Because of the order of operations specified here, applying the rules   
 1133    for any given PRECIS profile is not necessarily an idempotent           
 1134    procedure (e.g., under certain circumstances, such as when Unicode      
 1135    Normalization Form KC is used, performing Unicode normalization after   
 1136    case mapping can still yield uppercase characters for certain code      
 1137    points).  Therefore, an implementation SHOULD apply the rules           
 1138    repeatedly until the output string is stable; if the output string      
 1139    does not stabilize after reapplying the rules three (3) additional      
 1140    times after the first application, the implementation SHOULD            
 1141    terminate application of the rules and reject the input string as       
 1142    invalid.                                                                
 1143                                                                            
 1144 8.  Code Point Properties                                                  
 1145                                                                            
 1146    In order to implement the string classes described above, this          
 1147    document does the following:                                            
 1148                                                                            
 1149                                                                            
 1150                                                                            
 1151                                                                            
 1152 Saint-Andre & Blanchet       Standards Track                   [Page 21]   

 1153 RFC 8264                    PRECIS Framework                October 2017   
 1154                                                                            
 1155                                                                            
 1156    1.  Reviews and classifies the collections of code points in the        
 1157        Unicode coded character set by examining various code point         
 1158        properties.                                                         
 1159                                                                            
 1160    2.  Defines an algorithm for determining a derived property value,      
 1161        which can depend on the string class being used by the relevant     
 1162        application protocol.                                               
 1163                                                                            
 1164    This document is not intended to specify precisely how derived          
 1165    property values are to be applied in protocol strings.  That            
 1166    information is the responsibility of the protocol specification that    
 1167    uses or profiles a PRECIS string class from this document.  The value   
 1168    of the property is to be interpreted as follows.                        
 1169                                                                            
 1170    PROTOCOL VALID  Those code points that are allowed to be used in any    
 1171       PRECIS string class (currently, IdentifierClass and                  
 1172       FreeformClass).  The abbreviated term "PVALID" is used to refer to   
 1173       this value in the remainder of this document.                        
 1174                                                                            
 1175    SPECIFIC CLASS PROTOCOL VALID  Those code points that are allowed to    
 1176       be used in specific string classes.  In the remainder of this        
 1177       document, the abbreviated term *_PVAL is used, where * = (ID |       
 1178       FREE), i.e., either "FREE_PVAL" for the FreeformClass or "ID_PVAL"   
 1179       for the IdentifierClass.  In practice, the derived property          
 1180       ID_PVAL is not used in this specification, because every ID_PVAL     
 1181       code point is PVALID.                                                
 1182                                                                            
 1183    CONTEXTUAL RULE REQUIRED  Some characteristics of the code point,       
 1184       such as its being invisible in certain contexts or problematic in    
 1185       others, require that it not be used in a string unless specific      
 1186       other code points or properties are present in the string.  As in    
 1187       IDNA2008, there are two subdivisions of CONTEXTUAL RULE REQUIRED:    
 1188       the first for Join_controls (called "CONTEXTJ") and the second for   
 1189       other code points (called "CONTEXTO").  A string MUST NOT contain    
 1190       any characters whose validity is context-dependent, unless the       
 1191       validity is positively confirmed by a contextual rule.  To check     
 1192       this, each code point identified as CONTEXTJ or CONTEXTO in the      
 1193       "PRECIS Derived Property Value" registry (Section 11.1) MUST have    
 1194       a non-null rule.  If such a code point is missing a rule, the        
 1195       string is invalid.  If the rule exists but the result of applying    
 1196       the rule is negative or inconclusive, the proposed string is         
 1197       invalid.  The most notable of the CONTEXTUAL RULE REQUIRED code      
 1198       points are the Join Control code points ZERO WIDTH JOINER (U+200D)   
 1199       and ZERO WIDTH NON-JOINER (U+200C), which have a derived property    
 1200       value of CONTEXTJ.  See Appendix A of [RFC5892] for more             
 1201       information.                                                         
 1202                                                                            
 1203                                                                            
 1204                                                                            
 1205                                                                            
 1206                                                                            
 1207 Saint-Andre & Blanchet       Standards Track                   [Page 22]   

 1208 RFC 8264                    PRECIS Framework                October 2017   
 1209                                                                            
 1210                                                                            
 1211    DISALLOWED  Those code points that are not permitted in any PRECIS      
 1212       string class.                                                        
 1213                                                                            
 1214    SPECIFIC CLASS DISALLOWED  Those code points that are not to be         
 1215       included in one of the string classes but that might be permitted    
 1216       in others.  In the remainder of this document, the abbreviated       
 1217       term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS"   
 1218       for the FreeformClass or "ID_DIS" for the IdentifierClass.  In       
 1219       practice, the derived property FREE_DIS is not used in this          
 1220       specification, because every FREE_DIS code point is DISALLOWED.      
 1221                                                                            
 1222    UNASSIGNED  Those code points that are not designated (i.e., are        
 1223       unassigned) in the Unicode Standard.                                 
 1224                                                                            
 1225    The algorithm to calculate the value of the derived property is as      
 1226    follows (implementations MUST NOT modify the order of operations        
 1227    within this algorithm, because doing so would cause inconsistent        
 1228    results across implementations):                                        
 1229                                                                            
 1230    If .cp. .in. Exceptions Then Exceptions(cp);                            
 1231    Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);       
 1232    Else If .cp. .in. Unassigned Then UNASSIGNED;                           
 1233    Else If .cp. .in. ASCII7 Then PVALID;                                   
 1234    Else If .cp. .in. JoinControl Then CONTEXTJ;                            
 1235    Else If .cp. .in. OldHangulJamo Then DISALLOWED;                        
 1236    Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;            
 1237    Else If .cp. .in. Controls Then DISALLOWED;                             
 1238    Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;                   
 1239    Else If .cp. .in. LetterDigits Then PVALID;                             
 1240    Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;           
 1241    Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;                      
 1242    Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;                     
 1243    Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;                 
 1244    Else DISALLOWED;                                                        
 1245                                                                            
 1246    The value of the derived property calculated can depend on the string   
 1247    class; for example, if an identifier used in an application protocol    
 1248    is defined as profiling the PRECIS IdentifierClass then a space         
 1249    character such as SPACE (U+0020) would be assigned to ID_DIS, whereas   
 1250    if an identifier is defined as profiling the PRECIS FreeformClass       
 1251    then the character would be assigned to FREE_PVAL.  For the sake of     
 1252    brevity, the designation "FREE_PVAL" is used herein, instead of the     
 1253    longer designation "ID_DIS or FREE_PVAL".  In practice, the derived     
 1254    properties ID_PVAL and FREE_DIS are not used in this specification,     
 1255    because every ID_PVAL code point is PVALID and every FREE_DIS code      
 1256    point is DISALLOWED.                                                    
 1257                                                                            
 1258                                                                            
 1259                                                                            
 1260                                                                            
 1261                                                                            
 1262 Saint-Andre & Blanchet       Standards Track                   [Page 23]   

 1263 RFC 8264                    PRECIS Framework                October 2017   
 1264                                                                            
 1265                                                                            
 1266    Use of the name of a rule (such as "Exceptions") implies the set of     
 1267    code points that the rule defines, whereas the same name as a           
 1268    function call (such as "Exceptions(cp)") implies the value that the     
 1269    code point has in the Exceptions table.                                 
 1270                                                                            
 1271    The mechanisms described here allow determination of the value of the   
 1272    property for future versions of Unicode (including code points added    
 1273    after Unicode 5.2 or 7.0, depending on the category, because some       
 1274    categories mentioned in this document are simply pointers to IDNA2008   
 1275    and therefore were defined at the time of Unicode 5.2).  Changes in     
 1276    Unicode properties that do not affect the outcome of this process       
 1277    therefore do not affect this framework.  For example, a code point      
 1278    can have its Unicode General_Category value change from So to Sm, or    
 1279    from Lo to Ll, without affecting the algorithm results.  Moreover,      
 1280    even if such changes were to result, the BackwardCompatible list        
 1281    (Section 9.7) can be adjusted to ensure the stability of the results.   
 1282                                                                            
 1283 9.  Category Definitions Used to Calculate Derived Property                
 1284                                                                            
 1285    The derived property obtains its value based on a two-step procedure:   
 1286                                                                            
 1287    1.  Code points are placed in one or more character categories either   
 1288        (1) based on core properties defined by the Unicode Standard or     
 1289        (2) by treating the code point as an exception and addressing the   
 1290        code point based on its code point value.  These categories are     
 1291        not mutually exclusive.                                             
 1292                                                                            
 1293    2.  Set operations are used with these categories to determine the      
 1294        values for a property specific to a given string class.  These      
 1295        operations are specified under Section 8.                           
 1296                                                                            
 1297       Note: Unicode property names and property value names might have     
 1298       short abbreviations, such as "gc" for the General_Category           
 1299       property and "Ll" for the Lowercase_Letter property value of the     
 1300       gc property.                                                         
 1301                                                                            
 1302    In the following specification of character categories, the operation   
 1303    that returns the value of a particular Unicode code point property      
 1304    for a code point is designated by using the formal name of that         
 1305    property (from the Unicode PropertyAliases.txt file [PropertyAliases]   
 1306    followed by "(cp)" for "code point".  For example, the value of the     
 1307    General_Category property for a code point is indicated by              
 1308    General_Category(cp).                                                   
 1309                                                                            
 1310    The first ten categories (A-J) shown below were previously defined      
 1311    for IDNA2008 and are referenced from [RFC5892] to ease the              
 1312    understanding of how PRECIS handles various code points.  Some of       
 1313    these categories are reused in PRECIS, and some of them are not;        
 1314                                                                            
 1315                                                                            
 1316                                                                            
 1317 Saint-Andre & Blanchet       Standards Track                   [Page 24]   

 1318 RFC 8264                    PRECIS Framework                October 2017   
 1319                                                                            
 1320                                                                            
 1321    however, the lettering of categories is retained to prevent overlap     
 1322    and to ease implementation of both IDNA2008 and PRECIS in a single      
 1323    software application.  The next eight categories (K-R) are specific     
 1324    to PRECIS.                                                              
 1325                                                                            
 1326 9.1.  LetterDigits (A)                                                     
 1327                                                                            
 1328    This category is defined in Section 2.1 of [RFC5892] and is included    
 1329    by reference for use in PRECIS.                                         
 1330                                                                            
 1331 9.2.  Unstable (B)                                                         
 1332                                                                            
 1333    This category is defined in Section 2.2 of [RFC5892].  However, it is   
 1334    not used in PRECIS.                                                     
 1335                                                                            
 1336 9.3.  IgnorableProperties (C)                                              
 1337                                                                            
 1338    This category is defined in Section 2.3 of [RFC5892].  However, it is   
 1339    not used in PRECIS.                                                     
 1340                                                                            
 1341    Note: See the PrecisIgnorableProperties ("M") category below for a      
 1342    more inclusive category used in PRECIS identifiers.                     
 1343                                                                            
 1344 9.4.  IgnorableBlocks (D)                                                  
 1345                                                                            
 1346    This category is defined in Section 2.4 of [RFC5892].  However, it is   
 1347    not used in PRECIS.                                                     
 1348                                                                            
 1349 9.5.  LDH (E)                                                              
 1350                                                                            
 1351    This category is defined in Section 2.5 of [RFC5892].  However, it is   
 1352    not used in PRECIS.                                                     
 1353                                                                            
 1354    Note: See the ASCII7 ("K") category below for a more inclusive          
 1355    category used in PRECIS identifiers.                                    
 1356                                                                            
 1357 9.6.  Exceptions (F)                                                       
 1358                                                                            
 1359    This category is defined in Section 2.6 of [RFC5892] and is included    
 1360    by reference for use in PRECIS.                                         
 1361                                                                            
 1362 9.7.  BackwardCompatible (G)                                               
 1363                                                                            
 1364    This category is defined in Section 2.7 of [RFC5892] and is included    
 1365    by reference for use in PRECIS.                                         
 1366                                                                            
 1367    Note: Management of this category is handled via the processes          
 1368    specified in [RFC5892].  At the time of this writing (and also at the   
 1369                                                                            
 1370                                                                            
 1371                                                                            
 1372 Saint-Andre & Blanchet       Standards Track                   [Page 25]   

 1373 RFC 8264                    PRECIS Framework                October 2017   
 1374                                                                            
 1375                                                                            
 1376    time that RFC 5892 was published), this category consisted of the       
 1377    empty set; however, that is subject to change as described in           
 1378    RFC 5892.                                                               
 1379                                                                            
 1380 9.8.  JoinControl (H)                                                      
 1381                                                                            
 1382    This category is defined in Section 2.8 of [RFC5892] and is included    
 1383    by reference for use in PRECIS.                                         
 1384                                                                            
 1385    Note: In particular, the code points ZERO WIDTH JOINER (U+200D) and     
 1386    ZERO WIDTH NON-JOINER (U+200C) are necessary to produce certain         
 1387    combinations of characters in certain scripts (e.g., Arabic, Persian,   
 1388    and Indic scripts), but if used in other contexts, they can have        
 1389    consequences that violate the "Principle of Least Astonishment".        
 1390    Therefore, these code points are allowed only in contexts where they    
 1391    are appropriate, specifically where the relevant rule (CONTEXTJ or      
 1392    CONTEXTO) has been defined.  See [RFC5892] and [RFC5894] for further    
 1393    discussion.                                                             
 1394                                                                            
 1395 9.9.  OldHangulJamo (I)                                                    
 1396                                                                            
 1397    This category is defined in Section 2.9 of [RFC5892] and is included    
 1398    by reference for use in PRECIS.                                         
 1399                                                                            
 1400    Note: Exclusion of these code points results in disallowing certain     
 1401    archaic Korean syllables and in restricting supported Korean            
 1402    syllables to preformed, modern Hangul characters.                       
 1403                                                                            
 1404 9.10.  Unassigned (J)                                                      
 1405                                                                            
 1406    This category is defined in Section 2.10 of [RFC5892] and is included   
 1407    by reference for use in PRECIS.                                         
 1408                                                                            
 1409 9.11.  ASCII7 (K)                                                          
 1410                                                                            
 1411    This PRECIS-specific category consists of all printable, non-space      
 1412    code points from the 7-bit ASCII range.  By applying this category,     
 1413    the algorithm specified under Section 8 exempts these code points       
 1414    from other rules that might be applied during PRECIS processing, on     
 1415    the assumption that these code points are in such wide use that         
 1416    disallowing them would be counterproductive.                            
 1417                                                                            
 1418    K: cp is in {0021..007E}                                                
 1419                                                                            
 1420                                                                            
 1421                                                                            
 1422                                                                            
 1423                                                                            
 1424                                                                            
 1425                                                                            
 1426                                                                            
 1427 Saint-Andre & Blanchet       Standards Track                   [Page 26]   

 1428 RFC 8264                    PRECIS Framework                October 2017   
 1429                                                                            
 1430                                                                            
 1431 9.12.  Controls (L)                                                        
 1432                                                                            
 1433    This PRECIS-specific category consists of all control code points,      
 1434    such as LINE FEED (U+000A).                                             
 1435

top ICANNDNS RFC Annotations project

The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.

This RFC is included in the DNS RFCs annotation project whose home page is here.

GLOBAL HAS ERRATA

Has errata: #5478

 1436    L: Control(cp) = True                                                   
 1437                                                                            
 1438 9.13.  PrecisIgnorableProperties (M)                                       
 1439                                                                            
 1440    This PRECIS-specific category is used to group code points that are     
 1441    discouraged from use in PRECIS string classes.                          
 1442                                                                            
 1443    M: Default_Ignorable_Code_Point(cp) = True or                           
 1444       Noncharacter_Code_Point(cp) = True                                   
 1445                                                                            
 1446    The definition for Default_Ignorable_Code_Point can be found in the     
 1447    DerivedCoreProperties.txt file [DerivedCoreProperties].                 
 1448                                                                            
 1449    Note: In general, these code points are constructs such as so-called    
 1450    "soft hyphens", certain joining code points, various specialized code   
 1451    points for use within Unicode itself (e.g., language tags and           
 1452    variation selectors), and so on.  Disallowing these code points in      
 1453    PRECIS reduces the potential for unexpected results in the use of       
 1454    internationalized strings.                                              
 1455                                                                            
 1456 9.14.  Spaces (N)                                                          
 1457                                                                            
 1458    This PRECIS-specific category is used to group code points that are     
 1459    spaces.                                                                 
 1460                                                                            
 1461    N: General_Category(cp) is in {Zs}                                      
 1462                                                                            
 1463 9.15.  Symbols (O)                                                         
 1464                                                                            
 1465    This PRECIS-specific category is used to group code points that are     
 1466    symbols.                                                                
 1467                                                                            
 1468    O: General_Category(cp) is in {Sm, Sc, Sk, So}                          
 1469                                                                            
 1470 9.16.  Punctuation (P)                                                     
 1471                                                                            
 1472    This PRECIS-specific category is used to group code points that are     
 1473    punctuation.                                                            
 1474                                                                            
 1475    P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}              
 1476                                                                            
 1477                                                                            
 1478                                                                            
 1479                                                                            
 1480                                                                            
 1481                                                                            
 1482 Saint-Andre & Blanchet       Standards Track                   [Page 27]   

 1483 RFC 8264                    PRECIS Framework                October 2017   
 1484                                                                            
 1485                                                                            
 1486 9.17.  HasCompat (Q)                                                       
 1487                                                                            
 1488    This PRECIS-specific category is used to group any code point that is   
 1489    decomposed and recomposed into something other than itself under        
 1490    Unicode Normalization Form KC.                                          
 1491                                                                            
 1492    Q: toNFKC(cp) != cp                                                     
 1493                                                                            
 1494    Typically, this category is true of code points that are                
 1495    "compatibility decomposable characters" as defined in the Unicode       
 1496    Standard.                                                               
 1497                                                                            
 1498    The toNFKC() operation returns the code point in Normalization          
 1499    Form KC.  For more information, see Unicode Standard Annex #15          
 1500    [UAX15].                                                                
 1501                                                                            
 1502 9.18.  OtherLetterDigits (R)                                               
 1503                                                                            
 1504    This PRECIS-specific category is used to group code points that are     
 1505    letters and digits other than the "traditional" letters and digits      
 1506    grouped under the LetterDigits ("A") category (see Section 9.1).        
 1507                                                                            
 1508    R: General_Category(cp) is in {Lt, Nl, No, Me}                          
 1509                                                                            
 1510 10.  Guidelines for Designated Experts                                     
 1511                                                                            
 1512    Experience with internationalization in application protocols has       
 1513    shown that protocol designers and application developers usually do     
 1514    not understand the subtleties and trade-offs involved with              
 1515    internationalization and that they need considerable guidance in        
 1516    making reasonable decisions with regard to the options before them.     
 1517                                                                            
 1518    Therefore:                                                              
 1519                                                                            
 1520    o  Protocol designers are strongly encouraged to question the           
 1521       assumption that they need to define new profiles, because existing   
 1522       profiles are designed for wide reuse (see Section 5 for further      
 1523       discussion).                                                         
 1524                                                                            
 1525    o  Those who persist in defining new profiles are strongly encouraged   
 1526       to clearly explain a strong justification for doing so and to        
 1527       publish a stable specification that provides all of the              
 1528       information described under Section 11.3.                            
 1529                                                                            
 1530    o  The designated experts for profile registration requests ought to    
 1531       seek answers to all of the questions provided under Section 11.3     
 1532       and ought to encourage applicants to provide a stable                
 1533       specification documenting the profile (even though the               
 1534                                                                            
 1535                                                                            
 1536                                                                            
 1537 Saint-Andre & Blanchet       Standards Track                   [Page 28]   

 1538 RFC 8264                    PRECIS Framework                October 2017   
 1539                                                                            
 1540                                                                            
 1541       registration policy for PRECIS profiles is "Expert Review" and a     
 1542       stable specification is not strictly required).                      
 1543                                                                            
 1544    o  Developers of applications that use PRECIS are strongly encouraged   
 1545       to apply the guidelines provided under Section 6 and to seek out     
 1546       the advice of the designated experts or other knowledgeable          
 1547       individuals in doing so.                                             
 1548                                                                            
 1549    o  All parties are strongly encouraged to help prevent the              
 1550       multiplication of profiles beyond necessity, as described under      
 1551       Section 5.1, and to use PRECIS in ways that will minimize user       
 1552       confusion and insecure application behavior.                         
 1553                                                                            
 1554    Internationalization can be difficult and contentious; designated       
 1555    experts, profile registrants, and application developers are strongly   
 1556    encouraged to work together in a spirit of good faith and mutual        
 1557    understanding to achieve rough consensus on profile registration        
 1558    requests and the use of PRECIS in particular applications.  They are    
 1559    also encouraged to bring additional expertise into the discussion if    
 1560    that would be helpful in adding perspective or otherwise resolving      
 1561    issues.                                                                 
 1562                                                                            
 1563 11.  IANA Considerations                                                   
 1564                                                                            
 1565 11.1.  PRECIS Derived Property Value Registry                              
 1566                                                                            
 1567    IANA has created and now maintains the "PRECIS Derived Property         
 1568    Value" registry (<https://www.iana.org/assignments/precis-tables/>),    
 1569    which records the derived properties for each version of Unicode        
 1570    released starting from version 6.3.  The derived property value is to   
 1571    be calculated in cooperation with a designated expert [RFC8126]         
 1572    according to the rules specified under Sections 8 and 9.                
 1573                                                                            
 1574    The IESG is to be notified if backward-incompatible changes to the      
 1575    table of derived properties are discovered or if other problems arise   
 1576    during the process of creating the table of derived property values     
 1577    or during Expert Review.  Changes to the rules defined under            
 1578    Sections 8 and 9 require IETF Review.                                   
 1579                                                                            
 1580    Note: IANA is requested to not make further updates to this registry    
 1581    until it receives notice from the IESG that the issues described in     
 1582    [IAB-Statement] and Section 13.5 of this document have been settled.    
 1583                                                                            
 1584 11.2.  PRECIS Base Classes Registry                                        
 1585                                                                            
 1586    IANA has created the "PRECIS Base Classes" registry                     
 1587    (<https://www.iana.org/assignments/precis-parameters/>).  In            
 1588    accordance with [RFC8126], the registration policy is "RFC Required".   
 1589                                                                            
 1590                                                                            
 1591                                                                            
 1592 Saint-Andre & Blanchet       Standards Track                   [Page 29]   

 1593 RFC 8264                    PRECIS Framework                October 2017   
 1594                                                                            
 1595                                                                            
 1596    The registration template is as follows:                                
 1597                                                                            
 1598    Base Class:  [the name of the PRECIS string class]                      
 1599                                                                            
 1600    Description:  [a brief description of the PRECIS string class and its   
 1601       intended use, e.g., "A sequence of letters, numbers, and symbols     
 1602       that is used to identify or address a network entity."]              
 1603                                                                            
 1604    Reference:  [the RFC number]                                            
 1605                                                                            
 1606    The initial registrations are as follows:                               
 1607                                                                            
 1608    Base Class: FreeformClass                                               
 1609    Description: A sequence of letters, numbers, symbols, spaces, and       
 1610          other code points that is used for free-form strings.             
 1611    Specification: Section 4.3 of RFC 8264                                  
 1612                                                                            
 1613    Base Class: IdentifierClass                                             
 1614    Description: A sequence of letters, numbers, and symbols that is        
 1615          used to identify or address a network entity.                     
 1616    Specification: Section 4.2 of RFC 8264                                  
 1617                                                                            
 1618 11.3.  PRECIS Profiles Registry                                            
 1619                                                                            
 1620    IANA has created the "PRECIS Profiles" registry                         
 1621    (<https://www.iana.org/assignments/precis-parameters/>) to identify     
 1622    profiles that use the PRECIS string classes.  In accordance with        
 1623    [RFC8126], the registration policy is "Expert Review".  This policy     
 1624    was chosen in order to ease the burden of registration while ensuring   
 1625    that "customers" of PRECIS receive appropriate guidance regarding the   
 1626    sometimes complex and subtle internationalization issues related to     
 1627    profiles of PRECIS string classes.                                      
 1628                                                                            
 1629    The registration template is as follows:                                
 1630                                                                            
 1631    Name:  [the name of the profile]                                        
 1632                                                                            
 1633    Base Class:  [which PRECIS string class is being profiled]              
 1634                                                                            
 1635    Applicability:  [the specific protocol elements to which this profile   
 1636       applies, e.g., "Usernames in security and application protocols."]   
 1637                                                                            
 1638    Replaces:  [the Stringprep profile that this PRECIS profile replaces,   
 1639       if any]                                                              
 1640                                                                            
 1641    Width Mapping Rule:  [the behavioral rule for handling of width,        
 1642       e.g., "Map fullwidth and halfwidth code points to their              
 1643       compatibility variants."]                                            
 1644                                                                            
 1645                                                                            
 1646                                                                            
 1647 Saint-Andre & Blanchet       Standards Track                   [Page 30]   

 1648 RFC 8264                    PRECIS Framework                October 2017   
 1649                                                                            
 1650                                                                            
 1651    Additional Mapping Rule:  [any additional mappings that are required    
 1652       or recommended, e.g., "Map non-ASCII space code points to SPACE      
 1653       (U+0020)."]                                                          
 1654                                                                            
 1655    Case Mapping Rule:  [the behavioral rule for handling of case, e.g.,    
 1656       "Apply the Unicode toLowerCase() operation."]                        
 1657                                                                            
 1658    Normalization Rule:  [which Unicode normalization form is applied,      
 1659       e.g., "NFC"]                                                         
 1660                                                                            
 1661    Directionality Rule:  [the behavioral rule for handling of right-to-    
 1662       left code points, e.g., "The 'Bidi Rule' defined in RFC 5893         
 1663       applies."]                                                           
 1664                                                                            
 1665    Enforcement:  [which entities enforce the rules, and when that          
 1666       enforcement occurs during protocol operations]                       
 1667                                                                            
 1668    Specification:  [a pointer to relevant documentation, such as an RFC    
 1669       or Internet-Draft]                                                   
 1670                                                                            
 1671    In order to request a review, the registrant shall send a completed     
 1672    template to the <precis@ietf.org> list or its designated successor.     
 1673                                                                            
 1674    Factors to focus on while defining profiles and reviewing profile       
 1675    registrations include the following:                                    
 1676                                                                            
 1677    o  Would an existing PRECIS string class or profile solve the           
 1678       problem?  If not, why not?  (See Section 5.1 for related             
 1679       considerations.)                                                     
 1680                                                                            
 1681    o  Is the problem being addressed by this profile well defined?         
 1682                                                                            
 1683    o  Does the specification define what kinds of applications are         
 1684       involved and the protocol elements to which this profile applies?    
 1685                                                                            
 1686    o  Is the profile clearly defined?                                      
 1687                                                                            
 1688    o  Is the profile based on an appropriate dividing line between user    
 1689       interface (culture, context, intent, locale, device limitations,     
 1690       etc.) and the use of conformant strings in protocol elements?        
 1691                                                                            
 1692    o  Are the width mapping, case mapping, additional mapping,             
 1693       normalization, and directionality rules appropriate for the          
 1694       intended use?                                                        
 1695                                                                            
 1696    o  Does the profile explain which entities enforce the rules and when   
 1697       such enforcement occurs during protocol operations?                  
 1698                                                                            
 1699                                                                            
 1700                                                                            
 1701                                                                            
 1702 Saint-Andre & Blanchet       Standards Track                   [Page 31]   

 1703 RFC 8264                    PRECIS Framework                October 2017   
 1704                                                                            
 1705                                                                            
 1706    o  Does the profile reduce the degree to which human users could be     
 1707       surprised or confused by application behavior (the "Principle of     
 1708       Least Astonishment")?                                                
 1709                                                                            
 1710    o  Does the profile introduce any new security concerns such as those   
 1711       described under Section 12 of this document (e.g., false accepts     
 1712       for authentication or authorization)?                                
 1713                                                                            
 1714 12.  Security Considerations                                               
 1715                                                                            
 1716 12.1.  General Issues                                                      
 1717                                                                            
 1718    If input strings that appear "the same" to users are programmatically   
 1719    considered to be distinct in different systems or if input strings      
 1720    that appear distinct to users are programmatically considered to be     
 1721    "the same" in different systems, then users can be confused.  Such      
 1722    confusion can have security implications, such as the false accepts     
 1723    and false rejects discussed in [RFC6943] (the terms "false positives"   
 1724    and "false negatives" are used in that document).  One starting goal    
 1725    of work on the PRECIS framework was to limit the number of times that   
 1726    users are confused (consistent with the "Principle of Least             
 1727    Astonishment").  Unfortunately, this goal has been difficult to         
 1728    achieve given the large number of application protocols already in      
 1729    existence.  Despite these difficulties, profiles should not be          
 1730    multiplied beyond necessity (see Section 5.1).  In particular,          
 1731    designers of application protocols should think long and hard before    
 1732    defining a new profile instead of using one that has already been       
 1733    defined, and if they decide to define a new profile then they should    
 1734    clearly explain their reasons for doing so.                             
 1735                                                                            
 1736    The security of applications that use this framework can depend in      
 1737    part on the proper preparation, enforcement, and comparison of          
 1738    internationalized strings.  For example, such strings can be used to    
 1739    make authentication and authorization decisions, and the security of    
 1740    an application could be compromised if an entity providing a given      
 1741    string is connected to the wrong account or online resource based on    
 1742    different interpretations of the string (again, see [RFC6943]).         
 1743                                                                            
 1744    Specifications of application protocols that use this framework are     
 1745    strongly encouraged to describe how internationalized strings are       
 1746    used in the protocol, including the security implications of any        
 1747    false accepts and false rejects that might result from various          
 1748    enforcement and comparison operations.  For some helpful guidelines,    
 1749    refer to [RFC6943], [RFC5890], [UTR36], and [UTS39].                    
 1750                                                                            
 1751                                                                            
 1752                                                                            
 1753                                                                            
 1754                                                                            
 1755                                                                            
 1756                                                                            
 1757 Saint-Andre & Blanchet       Standards Track                   [Page 32]   

 1758 RFC 8264                    PRECIS Framework                October 2017   
 1759                                                                            
 1760                                                                            
 1761 12.2.  Use of the IdentifierClass                                          
 1762                                                                            
 1763    Strings that conform to the IdentifierClass, and any profile thereof,   
 1764    are intended to be relatively safe for use in a broad range of          
 1765    applications, primarily because they include only letters, digits,      
 1766    and "grandfathered" non-space code points from the ASCII range; thus,   
 1767    they exclude spaces, code points with compatibility equivalents, and    
 1768    almost all symbols and punctuation marks.  However, because such        
 1769    strings can still include so-called "confusable code points" (see       
 1770    Section 12.5), protocol designers and implementers are encouraged to    
 1771    pay close attention to the security considerations described            
 1772    elsewhere in this document.                                             
 1773                                                                            
 1774 12.3.  Use of the FreeformClass                                            
 1775                                                                            
 1776    Strings that conform to the FreeformClass, and many profiles thereof,   
 1777    can include virtually any Unicode code point.  This makes the           
 1778    FreeformClass quite expressive, but also problematic from the           
 1779    perspective of possible user confusion.  Protocol designers are         
 1780    hereby warned that the FreeformClass contains code points they might    
 1781    not understand, and they are encouraged to profile the                  
 1782    IdentifierClass wherever feasible; however, if an application           
 1783    protocol requires more code points than are allowed by the              
 1784    IdentifierClass, protocol designers are encouraged to define a          
 1785    profile of the FreeformClass that restricts the allowable code points   
 1786    as tightly as possible.  (The PRECIS Working Group considered the       
 1787    option of allowing "superclasses" as well as profiles of PRECIS         
 1788    string classes but decided against allowing superclasses to reduce      
 1789    the likelihood of security and interoperability problems.)              
 1790                                                                            
 1791 12.4.  Local Character Set Issues                                          
 1792                                                                            
 1793    When systems use local character sets other than ASCII and Unicode,     
 1794    this specification leaves the problem of converting between the local   
 1795    character set and Unicode up to the application or local system.  If    
 1796    different applications (or different versions of one application)       
 1797    implement different rules for conversions among coded character sets,   
 1798    they could interpret the same name differently and contact different    
 1799    application servers or other network entities.  This problem is not     
 1800    solved by security protocols, such as Transport Layer Security (TLS)    
 1801    [RFC5246] and SASL [RFC4422], that do not take local character sets     
 1802    into account.                                                           
 1803                                                                            
 1804 12.5.  Visually Similar Characters                                         
 1805                                                                            
 1806    Some code points are visually similar and thus can cause confusion      
 1807    among humans.  Such characters are often called "confusable             
 1808    characters" or "confusables".                                           
 1809                                                                            
 1810                                                                            
 1811                                                                            
 1812 Saint-Andre & Blanchet       Standards Track                   [Page 33]   

 1813 RFC 8264                    PRECIS Framework                October 2017   
 1814                                                                            
 1815                                                                            
 1816    The problem of confusable characters is not necessarily caused by the   
 1817    use of Unicode code points outside the ASCII range.  For example, in    
 1818    some presentations and to some individuals the string "ju1iet"          
 1819    (spelled with DIGIT ONE (U+0031) as the third character) might appear   
 1820    to be the same as "juliet" (spelled with LATIN SMALL LETTER L           
 1821    (U+006C)), especially on casual visual inspection.  This phenomenon     
 1822    is sometimes called "typejacking".                                      
 1823                                                                            
 1824    However, the problem is made more serious by introducing the full       
 1825    range of Unicode code points into protocol strings.  A well-known       
 1826    example is confusion between "а" CYRILLIC SMALL LETTER A (U+0430) and   
 1827    "a" LATIN SMALL LETTER A (U+0061).  As another example, the             
 1828    characters "ᏚᎢᎵᏋᎢᏋᏒ" (U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC         
 1829    U+13D2) from the Cherokee block look similar to the ASCII code points   
 1830    representing "STPETER" as they might appear when presented using a      
 1831    "creative" font family.  Confusion among such characters is perhaps     
 1832    not unexpected, given that the alphabetic writing systems involved      
 1833    all bear a family resemblance or historical lineage.  Perhaps more      
 1834    surprising is confusion among characters from disparate writing         
 1835    systems, such as "O" (LATIN CAPITAL LETTER O, U+004F), "0" (DIGIT       
 1836    ZERO, U+0030), "໐" (LAO DIGIT ZERO, U+0ED0), "ዐ" (ETHIOPIC SYLLABLE     
 1837    PHARYNGEAL A, U+12D0), and other graphemes that have the appearance     
 1838    of open circles.  And the reader needs to be aware that the foregoing   
 1839    represent merely a small sample of characters that are confusable in    
 1840    Unicode.                                                                
 1841                                                                            
 1842    In some instances of confusable characters, it is unlikely that the     
 1843    average human could tell the difference between the real string and     
 1844    the fake string.  (Indeed, there is no programmatic way to              
 1845    distinguish with full certainty which is the fake string and which is   
 1846    the real string; in some contexts, the string formed of Cherokee code   
 1847    points might be the real string and the string formed of ASCII code     
 1848    points might be the fake string.)  Because PRECIS-compliant strings     
 1849    can contain almost any properly encoded Unicode code point, it can be   
 1850    relatively easy to fake or mimic some strings in systems that use the   
 1851    PRECIS framework.  The fact that some strings are easily confused       
 1852    introduces security vulnerabilities of the kind that have also          
 1853    plagued the World Wide Web, specifically the phenomenon known as        
 1854    phishing.                                                               
 1855                                                                            
 1856    Despite the fact that some specific suggestions about identification    
 1857    and handling of confusable characters appear in the Unicode Security    
 1858    Considerations [UTR36] and the Unicode Security Mechanisms [UTS39],     
 1859    it is also true (as noted in [RFC5890]) that "there are no              
 1860    comprehensive technical solutions to the problems of confusable         
 1861    characters."  Because it is impossible to map visually similar          
 1862    characters without a great deal of context (such as knowing the font    
 1863    families used), the PRECIS framework does nothing to map similar-       
 1864                                                                            
 1865                                                                            
 1866                                                                            
 1867 Saint-Andre & Blanchet       Standards Track                   [Page 34]   

 1868 RFC 8264                    PRECIS Framework                October 2017   
 1869                                                                            
 1870                                                                            
 1871    looking characters together, nor does it prohibit some characters       
 1872    because they look like others.                                          
 1873                                                                            
 1874    Nevertheless, specifications for application protocols that use this    
 1875    framework are strongly encouraged to describe how confusable            
 1876    characters can be abused to compromise the security of systems that     
 1877    use the protocol in question, along with any protocol-specific          
 1878    suggestions for overcoming those threats.  In particular, software      
 1879    implementations and service deployments that use PRECIS-based           
 1880    technologies are strongly encouraged to define and implement            
 1881    consistent policies regarding the registration, storage, and            
 1882    presentation of visually similar characters.  The following             
 1883    recommendations are appropriate:                                        
 1884                                                                            
 1885    1.  An application service SHOULD define a policy that specifies the    
 1886        scripts or blocks of code points that the service will allow to     
 1887        be registered (e.g., in an account name) or stored (e.g., in a      
 1888        filename).  Such a policy SHOULD be informed by the languages and   
 1889        scripts that are used to write registered account names; in         
 1890        particular, to reduce confusion, the service SHOULD forbid          
 1891        registration or storage of strings that contain code points from    
 1892        more than one script and SHOULD restrict registrations to code      
 1893        points drawn from a very small number of scripts (e.g., scripts     
 1894        that are well understood by the administrators of the service, to   
 1895        improve manageability).                                             
 1896                                                                            
 1897    2.  User-oriented application software SHOULD define a policy that      
 1898        specifies how internationalized strings will be presented to a      
 1899        human user.  Because every human user of such software has a        
 1900        preferred language or a small set of preferred languages, the       
 1901        software SHOULD gather that information either explicitly from      
 1902        the user or implicitly via the operating system of the user's       
 1903        device.                                                             
 1904                                                                            
 1905    The challenges inherent in supporting the full range of Unicode code    
 1906    points have in the past led some to hope for a way to                   
 1907    programmatically negotiate more restrictive ranges based on locale,     
 1908    script, or other relevant factors; to tag the locale associated with    
 1909    a particular string; etc.  As a general-purpose internationalization    
 1910    technology, the PRECIS framework does not include such mechanisms.      
 1911                                                                            
 1912 12.6.  Security of Passwords                                               
 1913                                                                            
 1914    Two goals of passwords are to maximize the amount of entropy and to     
 1915    minimize the potential for false accepts.  These goals can be           
 1916    achieved in part by allowing a wide range of code points and by         
 1917    ensuring that passwords are handled in such a way that code points      
 1918    are not compared aggressively.  Therefore, it is NOT RECOMMENDED for    
 1919                                                                            
 1920                                                                            
 1921                                                                            
 1922 Saint-Andre & Blanchet       Standards Track                   [Page 35]   

 1923 RFC 8264                    PRECIS Framework                October 2017   
 1924                                                                            
 1925                                                                            
 1926    application protocols to profile the FreeformClass for use in           
 1927    passwords in a way that removes entire categories (e.g., by             
 1928    disallowing symbols or punctuation).  Furthermore, it is                
 1929    NOT RECOMMENDED for application protocols to map uppercase and          
 1930    titlecase code points to their lowercase equivalents in such strings;   
 1931    instead, it is RECOMMENDED to preserve the case of all code points      
 1932    contained in such strings and to compare them in a case-sensitive       
 1933    manner.                                                                 
 1934                                                                            
 1935    That said, software implementers need to be aware that there exist      
 1936    trade-offs between entropy and usability.  For example, allowing a      
 1937    user to establish a password containing "uncommon" code points might    
 1938    make it difficult for the user to access a service when using an        
 1939    unfamiliar or constrained input device.                                 
 1940                                                                            
 1941    Some application protocols use passwords directly, whereas others       
 1942    reuse technologies that themselves process passwords (one example of    
 1943    such a technology is SASL [RFC4422]).  Moreover, passwords are often    
 1944    carried by a sequence of protocols with backend authentication          
 1945    systems or data storage systems such as RADIUS [RFC2865] and the        
 1946    Lightweight Directory Access Protocol (LDAP) [RFC4510].  Developers     
 1947    of application protocols are encouraged to look into reusing these      
 1948    profiles instead of defining new ones, so that end-user expectations    
 1949    about passwords are consistent no matter which application protocol     
 1950    is used.                                                                
 1951                                                                            
 1952    In protocols that provide passwords as input to a cryptographic         
 1953    algorithm such as a hash function, the client will need to perform      
 1954    proper preparation of the password before applying the algorithm,       
 1955    because the password is not available to the server in plaintext        
 1956    form.                                                                   
 1957                                                                            
 1958    Further discussion of password handling can be found in [RFC8265].      
 1959                                                                            
 1960 13.  Interoperability Considerations                                       
 1961                                                                            
 1962 13.1.  Coded Character Sets                                                
 1963                                                                            
 1964    It is known that some existing applications and systems do not          
 1965    support the full Unicode coded character set, or even any characters    
 1966    outside the ASCII repertoire [RFC20].  If two (or more) applications    
 1967    or systems need to interoperate when exchanging data (e.g., for the     
 1968    purpose of authenticating the combination of a username and             
 1969    password), naturally they will need to have in common at least one      
 1970    coded character set and the repertoire of characters being exchanged    
 1971    (see [RFC6365] for definitions of these terms).  Establishing such a    
 1972    baseline is a matter for the application or system that uses PRECIS,    
 1973    not for the PRECIS framework.                                           
 1974                                                                            
 1975                                                                            
 1976                                                                            
 1977 Saint-Andre & Blanchet       Standards Track                   [Page 36]   

 1978 RFC 8264                    PRECIS Framework                October 2017   
 1979                                                                            
 1980                                                                            
 1981 13.2.  Dependency on Unicode                                               
 1982                                                                            
 1983    The only coded character set supported by PRECIS is Unicode.  If an     
 1984    application or system does not support Unicode or uses a different      
 1985    coded character set [RFC6365], then the PRECIS rules cannot be          
 1986    applied to that application or system.                                  
 1987                                                                            
 1988 13.3.  Encoding                                                            
 1989                                                                            
 1990    Although strings that are consumed in PRECIS-based application          
 1991    protocols are often encoded using UTF-8 [RFC3629], the exact encoding   
 1992    is a matter for the application protocol that uses PRECIS, not for      
 1993    the PRECIS framework or for specifications that define PRECIS string    
 1994    classes or profiles thereof.                                            
 1995                                                                            
 1996 13.4.  Unicode Versions                                                    
 1997                                                                            
 1998    It is extremely important for protocol designers and application        
 1999    developers to understand that various changes can occur across          
 2000    versions of the Unicode Standard, and such changes can result in        
 2001    instability of PRECIS categories.  The following are merely a few       
 2002    examples:                                                               
 2003                                                                            
 2004    o  As described in [RFC6452], between Unicode 5.2 (current at the       
 2005       time IDNA2008 was originally published) and Unicode 6.0, three       
 2006       code points underwent changes in their GeneralCategory, resulting    
 2007       in modified handling, depending on which version of Unicode is       
 2008       available on the underlying system.                                  
 2009                                                                            
 2010    o  The HasCompat() categorization of a given input string could         
 2011       change if, for example, the string includes a precomposed            
 2012       character that was added in a recent version of Unicode.             
 2013                                                                            
 2014    o  The East Asian width property, which is used in many PRECIS width    
 2015       mapping rules, is not guaranteed to be stable across Unicode         
 2016       versions.                                                            
 2017                                                                            
 2018 13.5.  Potential Changes to Handling of Certain Unicode Code Points        
 2019                                                                            
 2020    As part of the review of Unicode 7.0 for IDNA, a question was raised    
 2021    about a newly added code point that led to a re-analysis of the         
 2022    normalization rules used by IDNA and inherited by this document         
 2023    (Section 5.2.4).  Some of the general issues are described in           
 2024    [IAB-Statement] and pursued in more detail in [IDNA-Unicode].           
 2025                                                                            
 2026    At the time of this writing, these issues have yet to be settled.       
 2027    However, implementers need to be aware that this specification is       
 2028                                                                            
 2029                                                                            
 2030                                                                            
 2031                                                                            
 2032 Saint-Andre & Blanchet       Standards Track                   [Page 37]   

 2033 RFC 8264                    PRECIS Framework                October 2017   
 2034                                                                            
 2035                                                                            
 2036    likely to be updated in the future to address these issues.  The        
 2037    potential changes include but might not be limited to the following:    
 2038                                                                            
 2039    o  The range of code points in the LetterDigits category                
 2040       (Sections 4.2.1 and 9.1) might be narrowed.                          
 2041                                                                            
 2042    o  Some code points with special properties that are now allowed        
 2043       might be excluded.                                                   
 2044                                                                            
 2045    o  More additional mapping rules (Section 5.2.2) might be defined.      
 2046                                                                            
 2047    o  Alternative normalization methods might be added.                    
 2048                                                                            
 2049    As described in Section 11.1, until these issues are settled, it is     
 2050    reasonable for the IANA to apply the same precautionary principle       
 2051    described in [IAB-Statement] to the "PRECIS Derived Property Value"     
 2052    registry as is applied to the "IDNA Parameters" registry                
 2053    <https://www.iana.org/assignments/idna-tables/>: that is, to not make   
 2054    further updates to the registry.                                        
 2055                                                                            
 2056    Nevertheless, implementations and deployments are unlikely to           
 2057    encounter significant problems as a consequence of these issues or      
 2058    potential changes if they follow the advice given in this               
 2059    specification to use the more restrictive IdentifierClass whenever      
 2060    possible or, if using the FreeformClass, to allow only a restricted     
 2061    set of code points, particularly avoiding code points whose             
 2062    implications they do not understand.                                    
 2063                                                                            
 2064 14.  References                                                            
 2065                                                                            
 2066 14.1.  Normative References                                                
 2067                                                                            
 2068    [RFC20]    Cerf, V., "ASCII format for network interchange", STD 80,    
 2069               RFC 20, DOI 10.17487/RFC0020, October 1969,                  
 2070               <https://www.rfc-editor.org/info/rfc20>.                     
 2071                                                                            
 2072    [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate          
 2073               Requirement Levels", BCP 14, RFC 2119,                       
 2074               DOI 10.17487/RFC2119, March 1997,                            
 2075               <https://www.rfc-editor.org/info/rfc2119>.                   
 2076                                                                            
 2077    [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network    
 2078               Interchange", RFC 5198, DOI 10.17487/RFC5198, March 2008,    
 2079               <https://www.rfc-editor.org/info/rfc5198>.                   
 2080                                                                            
 2081                                                                            
 2082                                                                            
 2083                                                                            
 2084                                                                            
 2085                                                                            
 2086                                                                            
 2087 Saint-Andre & Blanchet       Standards Track                   [Page 38]   

 2088 RFC 8264                    PRECIS Framework                October 2017   
 2089                                                                            
 2090                                                                            
 2091    [RFC6365]  Hoffman, P. and J. Klensin, "Terminology Used in             
 2092               Internationalization in the IETF", BCP 166, RFC 6365,        
 2093               DOI 10.17487/RFC6365, September 2011,                        
 2094               <https://www.rfc-editor.org/info/rfc6365>.                   
 2095                                                                            
 2096    [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC       
 2097               2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,     
 2098               May 2017, <https://www.rfc-editor.org/info/rfc8174>.         
 2099                                                                            
 2100    [Unicode]  The Unicode Consortium, "The Unicode Standard",              
 2101               <http://www.unicode.org/versions/latest/>.                   
 2102                                                                            
 2103 14.2.  Informative References                                              
 2104                                                                            
 2105    [DerivedCoreProperties]                                                 
 2106               The Unicode Consortium, "DerivedCoreProperties-              
 2107               10.0.0.txt", Unicode Character Database, March 2017,         
 2108               <http://www.unicode.org/Public/UCD/latest/ucd/               
 2109               DerivedCoreProperties.txt>.                                  
 2110                                                                            
 2111    [Err4568]  RFC Errata, Erratum ID 4568, RFC 7564,                       
 2112               <https://www.rfc-editor.org/errata/eid4568>.                 
 2113                                                                            
 2114    [IAB-Statement]                                                         
 2115               Internet Architecture Board, "IAB Statement on Identifiers   
 2116               and Unicode 7.0.0", February 2015,                           
 2117               <https://www.iab.org/documents/                              
 2118               correspondence-reports-documents/2015-2/                     
 2119               iab-statement-on-identifiers-and-unicode-7-0-0/>.            
 2120                                                                            
 2121    [IDNA-Unicode]                                                          
 2122               Klensin, J. and P. Faltstrom, "IDNA Update for Unicode       
 2123               7.0.0", Work in Progress, draft-klensin-idna-5892upd-        
 2124               unicode70-04, March 2015.                                    
 2125                                                                            
 2126    [PropertyAliases]                                                       
 2127               The Unicode Consortium, "PropertyAliases-10.0.0.txt",        
 2128               Unicode Character Database, February 2017,                   
 2129               <http://www.unicode.org/Public/UCD/latest/ucd/               
 2130               PropertyAliases.txt>.                                        
 2131                                                                            
 2132    [RFC2865]  Rigney, C., Willens, S., Rubens, A., and W. Simpson,         
 2133               "Remote Authentication Dial In User Service (RADIUS)",       
 2134               RFC 2865, DOI 10.17487/RFC2865, June 2000,                   
 2135               <https://www.rfc-editor.org/info/rfc2865>.                   
 2136                                                                            
 2137                                                                            
 2138                                                                            
 2139                                                                            
 2140                                                                            
 2141                                                                            
 2142 Saint-Andre & Blanchet       Standards Track                   [Page 39]   

 2143 RFC 8264                    PRECIS Framework                October 2017   
 2144                                                                            
 2145                                                                            
 2146    [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of                 
 2147               Internationalized Strings ("stringprep")", RFC 3454,         
 2148               DOI 10.17487/RFC3454, December 2002,                         
 2149               <https://www.rfc-editor.org/info/rfc3454>.                   
 2150                                                                            
 2151    [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,                 
 2152               "Internationalizing Domain Names in Applications (IDNA)",    
 2153               RFC 3490, DOI 10.17487/RFC3490, March 2003,                  
 2154               <https://www.rfc-editor.org/info/rfc3490>.                   
 2155                                                                            
 2156    [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep         
 2157               Profile for Internationalized Domain Names (IDN)",           
 2158               RFC 3491, DOI 10.17487/RFC3491, March 2003,                  
 2159               <https://www.rfc-editor.org/info/rfc3491>.                   
 2160                                                                            
 2161    [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO          
 2162               10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November     
 2163               2003, <https://www.rfc-editor.org/info/rfc3629>.             
 2164                                                                            
 2165    [RFC4422]  Melnikov, A., Ed. and K. Zeilenga, Ed., "Simple              
 2166               Authentication and Security Layer (SASL)", RFC 4422,         
 2167               DOI 10.17487/RFC4422, June 2006,                             
 2168               <https://www.rfc-editor.org/info/rfc4422>.                   
 2169                                                                            
 2170    [RFC4510]  Zeilenga, K., Ed., "Lightweight Directory Access Protocol    
 2171               (LDAP): Technical Specification Road Map", RFC 4510,         
 2172               DOI 10.17487/RFC4510, June 2006,                             
 2173               <https://www.rfc-editor.org/info/rfc4510>.                   
 2174                                                                            
 2175    [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and   
 2176               Recommendations for Internationalized Domain Names           
 2177               (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006,     
 2178               <https://www.rfc-editor.org/info/rfc4690>.                   
 2179                                                                            
 2180    [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax   
 2181               Specifications: ABNF", STD 68, RFC 5234,                     
 2182               DOI 10.17487/RFC5234, January 2008,                          
 2183               <https://www.rfc-editor.org/info/rfc5234>.                   
 2184                                                                            
 2185    [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security    
 2186               (TLS) Protocol Version 1.2", RFC 5246,                       
 2187               DOI 10.17487/RFC5246, August 2008,                           
 2188               <https://www.rfc-editor.org/info/rfc5246>.                   
 2189                                                                            
 2190    [RFC5890]  Klensin, J., "Internationalized Domain Names for             
 2191               Applications (IDNA): Definitions and Document Framework",    
 2192               RFC 5890, DOI 10.17487/RFC5890, August 2010,                 
 2193               <https://www.rfc-editor.org/info/rfc5890>.                   
 2194                                                                            
 2195                                                                            
 2196                                                                            
 2197 Saint-Andre & Blanchet       Standards Track                   [Page 40]   

 2198 RFC 8264                    PRECIS Framework                October 2017   
 2199                                                                            
 2200                                                                            
 2201    [RFC5891]  Klensin, J., "Internationalized Domain Names in              
 2202               Applications (IDNA): Protocol", RFC 5891,                    
 2203               DOI 10.17487/RFC5891, August 2010,                           
 2204               <https://www.rfc-editor.org/info/rfc5891>.                   
 2205                                                                            
 2206    [RFC5892]  Faltstrom, P., Ed., "The Unicode Code Points and             
 2207               Internationalized Domain Names for Applications (IDNA)",     
 2208               RFC 5892, DOI 10.17487/RFC5892, August 2010,                 
 2209               <https://www.rfc-editor.org/info/rfc5892>.                   
 2210                                                                            
 2211    [RFC5893]  Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts      
 2212               for Internationalized Domain Names for Applications          
 2213               (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010,        
 2214               <https://www.rfc-editor.org/info/rfc5893>.                   
 2215                                                                            
 2216    [RFC5894]  Klensin, J., "Internationalized Domain Names for             
 2217               Applications (IDNA): Background, Explanation, and            
 2218               Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,     
 2219               <https://www.rfc-editor.org/info/rfc5894>.                   
 2220                                                                            
 2221    [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for          
 2222               Internationalized Domain Names in Applications (IDNA)        
 2223               2008", RFC 5895, DOI 10.17487/RFC5895, September 2010,       
 2224               <https://www.rfc-editor.org/info/rfc5895>.                   
 2225                                                                            
 2226    [RFC6452]  Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code    
 2227               Points and Internationalized Domain Names for Applications   
 2228               (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,       
 2229               November 2011, <https://www.rfc-editor.org/info/rfc6452>.    
 2230                                                                            
 2231    [RFC6885]  Blanchet, M. and A. Sullivan, "Stringprep Revision and       
 2232               Problem Statement for the Preparation and Comparison of      
 2233               Internationalized Strings (PRECIS)", RFC 6885,               
 2234               DOI 10.17487/RFC6885, March 2013,                            
 2235               <https://www.rfc-editor.org/info/rfc6885>.                   
 2236                                                                            
 2237    [RFC6943]  Thaler, D., Ed., "Issues in Identifier Comparison for        
 2238               Security Purposes", RFC 6943, DOI 10.17487/RFC6943, May      
 2239               2013, <https://www.rfc-editor.org/info/rfc6943>.             
 2240                                                                            
 2241    [RFC7564]  Saint-Andre, P. and M. Blanchet, "PRECIS Framework:          
 2242               Preparation, Enforcement, and Comparison of                  
 2243               Internationalized Strings in Application Protocols",         
 2244               RFC 7564, DOI 10.17487/RFC7564, May 2015,                    
 2245               <https://www.rfc-editor.org/info/rfc7564>.                   
 2246                                                                            
 2247                                                                            
 2248                                                                            
 2249                                                                            
 2250                                                                            
 2251                                                                            
 2252 Saint-Andre & Blanchet       Standards Track                   [Page 41]   

 2253 RFC 8264                    PRECIS Framework                October 2017   
 2254                                                                            
 2255                                                                            
 2256    [RFC7622]  Saint-Andre, P., "Extensible Messaging and Presence          
 2257               Protocol (XMPP): Address Format", RFC 7622,                  
 2258               DOI 10.17487/RFC7622, September 2015,                        
 2259               <https://www.rfc-editor.org/info/rfc7622>.                   
 2260                                                                            
 2261    [RFC7790]  Yoneya, Y. and T. Nemoto, "Mapping Characters for Classes    
 2262               of the Preparation, Enforcement, and Comparison of           
 2263               Internationalized Strings (PRECIS)", RFC 7790,               
 2264               DOI 10.17487/RFC7790, February 2016,                         
 2265               <https://www.rfc-editor.org/info/rfc7790>.                   
 2266                                                                            
 2267    [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for        
 2268               Writing an IANA Considerations Section in RFCs", BCP 26,     
 2269               RFC 8126, DOI 10.17487/RFC8126, June 2017,                   
 2270               <https://www.rfc-editor.org/info/rfc8126>.                   
 2271                                                                            
 2272    [RFC8265]  Saint-Andre, P. and A. Melnikov, "Preparation,               
 2273               Enforcement, and Comparison of Internationalized Strings     
 2274               Representing Usernames and Passwords", RFC 8265,             
 2275               DOI 10.17487/RFC8265, October 2017,                          
 2276               <https://www.rfc-editor.org/info/rfc8265>.                   
 2277                                                                            
 2278    [RFC8266]  Saint-Andre, P., "Preparation, Enforcement, and Comparison   
 2279               of Internationalized Strings Representing Nicknames",        
 2280               RFC 8266, DOI 10.17487/RFC8266, October 2017,                
 2281               <https://www.rfc-editor.org/info/rfc8266>.                   
 2282                                                                            
 2283    [UAX11]    Unicode Standard Annex #11, "East Asian Width", edited by    
 2284               Ken Lunde.  An integral part of The Unicode Standard,        
 2285               <http://unicode.org/reports/tr11/>.                          
 2286                                                                            
 2287    [UAX15]    Unicode Standard Annex #15, "Unicode Normalization Forms",   
 2288               edited by Mark Davis and Ken Whistler.  An integral part     
 2289               of The Unicode Standard,                                     
 2290               <http://unicode.org/reports/tr15/>.                          
 2291                                                                            
 2292    [UAX9]     Unicode Standard Annex #9, "Unicode Bidirectional            
 2293               Algorithm", edited by Mark Davis, Aharon Lanin, and Andrew   
 2294               Glass.  An integral part of The Unicode Standard,            
 2295               <http://unicode.org/reports/tr9/>.                           
 2296                                                                            
 2297    [UTR36]    Unicode Technical Report #36, "Unicode Security              
 2298               Considerations", edited by Mark Davis and Michel Suignard,   
 2299               <http://unicode.org/reports/tr36/>.                          
 2300                                                                            
 2301    [UTS39]    Unicode Technical Standard #39, "Unicode Security            
 2302               Mechanisms", edited by Mark Davis and Michel Suignard,       
 2303               <http://unicode.org/reports/tr39/>.                          
 2304                                                                            
 2305                                                                            
 2306                                                                            
 2307 Saint-Andre & Blanchet       Standards Track                   [Page 42]   

 2308 RFC 8264                    PRECIS Framework                October 2017   
 2309                                                                            
 2310                                                                            
 2311 Appendix A.  Changes from RFC 7564                                         
 2312                                                                            
 2313    The following changes were made from [RFC7564].                         
 2314                                                                            
 2315    o  Recommended the Unicode toLowerCase() operation over the Unicode     
 2316       toCaseFold() operation in most PRECIS applications.                  
 2317                                                                            
 2318    o  Clarified the meaning of "preparation", and described the            
 2319       motivation for including it in PRECIS.                               
 2320                                                                            
 2321    o  Updated references.                                                  
 2322                                                                            
 2323    See [RFC7564] for a description of the differences from [RFC3454].      
 2324                                                                            
 2325 Acknowledgements                                                           
 2326                                                                            
 2327    Thanks to Martin Duerst, William Fisher, John Klensin, Christian        
 2328    Schudt, and Sam Whited for their feedback.  Thanks to Sam Whited also   
 2329    for submitting [Err4568].                                               
 2330                                                                            
 2331    See [RFC7564] for acknowledgements related to the specification that    
 2332    this document supersedes.                                               
 2333                                                                            
 2334    Some algorithms and textual descriptions have been borrowed from        
 2335    [RFC5892].  Some text regarding security has been borrowed from         
 2336    [RFC5890], [RFC8265], and [RFC7622].                                    
 2337                                                                            
 2338 Authors' Addresses                                                         
 2339                                                                            
 2340    Peter Saint-Andre                                                       
 2341    Jabber.org                                                              
 2342    P.O. Box 787                                                            
 2343    Parker, CO  80134                                                       
 2344    United States of America                                                
 2345                                                                            
 2346    Phone: +1 720 256 6756                                                  
 2347    Email: stpeter@jabber.org                                               
 2348    URI:   https://www.jabber.org/                                          
 2349                                                                            
 2350                                                                            
 2351    Marc Blanchet                                                           
 2352    Viagenie                                                                
 2353    246 Aberdeen                                                            
 2354    Québec, QC  G1R 2E1                                                     
 2355    Canada                                                                  
 2356                                                                            
 2357    Email: Marc.Blanchet@viagenie.ca                                        
 2358    URI:   http://www.viagenie.ca/                                          
 2359                                                                            
 2360                                                                            
 2361                                                                            
 2362 Saint-Andre & Blanchet       Standards Track                   [Page 43]   
 2363

line-1436 Peter Occil(Technical Erratum #5478) [Reported]

based on outdated version

   L: Control(cp) = True

It should say:

   L: ControlGeneral_Category(cp) = TrueCc

Nothing in the Unicode Standard (including UAX 44) defines a property or function named Control. What is probably meant is the General_Category property value Control, which is abbreviated Cc.