1 Internet Engineering Task Force (IETF) P. Saint-Andre
2 Request for Comments: 8264 Jabber.org
3 Obsoletes: 7564 M. Blanchet
4 Category: Standards Track Viagenie
5 ISSN: 2070-1721 October 2017
6
7
8 PRECIS Framework: Preparation, Enforcement, and Comparison of
9 Internationalized Strings in Application Protocols
10
11 Abstract
12
13 Application protocols using Unicode code points in protocol strings
14 need to properly handle such strings in order to enforce
15 internationalization rules for strings placed in various protocol
16 slots (such as addresses and identifiers) and to perform valid
17 comparison operations (e.g., for purposes of authentication or
18 authorization). This document defines a framework enabling
19 application protocols to perform the preparation, enforcement, and
20 comparison of internationalized strings ("PRECIS") in a way that
21 depends on the properties of Unicode code points and thus is more
22 agile with respect to versions of Unicode. As a result, this
23 framework provides a more sustainable approach to the handling of
24 internationalized strings than the previous framework, known as
25 Stringprep (RFC 3454). This document obsoletes RFC 7564.
26
27 Status of This Memo
28
29 This is an Internet Standards Track document.
30
31 This document is a product of the Internet Engineering Task Force
32 (IETF). It represents the consensus of the IETF community. It has
33 received public review and has been approved for publication by the
34 Internet Engineering Steering Group (IESG). Further information on
35 Internet Standards is available in Section 2 of RFC 7841.
36
37 Information about the current status of this document, any errata,
38 and how to provide feedback on it may be obtained at
39 https://www.rfc-editor.org/info/rfc8264.
40
41 Copyright Notice
42
43 Copyright (c) 2017 IETF Trust and the persons identified as the
44 document authors. All rights reserved.
45
46 This document is subject to BCP 78 and the IETF Trust's Legal
47 Provisions Relating to IETF Documents
48 (https://trustee.ietf.org/license-info) in effect on the date of
49
50
51
52 Saint-Andre & Blanchet Standards Track [Page 1]
53 RFC 8264 PRECIS Framework October 2017
54
55
56 publication of this document. Please review these documents
57 carefully, as they describe your rights and restrictions with respect
58 to this document. Code Components extracted from this document must
59 include Simplified BSD License text as described in Section 4.e of
60 the Trust Legal Provisions and are provided without warranty as
61 described in the Simplified BSD License.
62
63 Table of Contents
64
65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
67 3. Preparation, Enforcement, and Comparison . . . . . . . . . . 6
68 4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 8
69 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 8
70 4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 9
71 4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9
72 4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 10
73 4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 10
74 4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 10
75 4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 11
76 4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 11
77 4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11
78 4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 12
79 4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 12
80 4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12
81 4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12
82 4.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 12
83 5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 14
84 5.1. Profiles Must Not Be Multiplied beyond Necessity . . . . 14
85 5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 15
86 5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 15
87 5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 15
88 5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 16
89 5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 16
90 5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 17
91 5.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 18
92 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . 18
93 6.1. How to Use PRECIS in Applications . . . . . . . . . . . . 18
94 6.2. Further Excluded Characters . . . . . . . . . . . . . . . 20
95 6.3. Building Application-Layer Constructs . . . . . . . . . . 20
96 7. Order of Operations . . . . . . . . . . . . . . . . . . . . . 21
97 8. Code Point Properties . . . . . . . . . . . . . . . . . . . . 21
98 9. Category Definitions Used to Calculate Derived Property . . . 24
99 9.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 25
100 9.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 25
101 9.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 25
102 9.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 25
103 9.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 25
104
105
106
107 Saint-Andre & Blanchet Standards Track [Page 2]
108 RFC 8264 PRECIS Framework October 2017
109
110
111 9.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 25
112 9.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 25
113 9.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 26
114 9.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 26
115 9.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 26
116 9.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 26
117 9.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 27
118 9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 27
119 9.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 27
120 9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 27
121 9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 27
122 9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 28
123 9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 28
124 10. Guidelines for Designated Experts . . . . . . . . . . . . . . 28
125 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29
126 11.1. PRECIS Derived Property Value Registry . . . . . . . . . 29
127 11.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 29
128 11.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 30
129 12. Security Considerations . . . . . . . . . . . . . . . . . . . 32
130 12.1. General Issues . . . . . . . . . . . . . . . . . . . . . 32
131 12.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 33
132 12.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 33
133 12.4. Local Character Set Issues . . . . . . . . . . . . . . . 33
134 12.5. Visually Similar Characters . . . . . . . . . . . . . . 33
135 12.6. Security of Passwords . . . . . . . . . . . . . . . . . 35
136 13. Interoperability Considerations . . . . . . . . . . . . . . . 36
137 13.1. Coded Character Sets . . . . . . . . . . . . . . . . . . 36
138 13.2. Dependency on Unicode . . . . . . . . . . . . . . . . . 37
139 13.3. Encoding . . . . . . . . . . . . . . . . . . . . . . . . 37
140 13.4. Unicode Versions . . . . . . . . . . . . . . . . . . . . 37
141 13.5. Potential Changes to Handling of Certain Unicode Code
142 Points . . . . . . . . . . . . . . . . . . . . . . . . . 37
143 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 38
144 14.1. Normative References . . . . . . . . . . . . . . . . . . 38
145 14.2. Informative References . . . . . . . . . . . . . . . . . 39
146 Appendix A. Changes from RFC 7564 . . . . . . . . . . . . . . . 43
147 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 43
148 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43
149
150 1. Introduction
151
152 Application protocols using Unicode code points [Unicode] in protocol
153 strings need to properly handle such strings in order to enforce
154 internationalization rules for strings placed in various protocol
155 slots (such as addresses and identifiers) and to perform valid
156 comparison operations (e.g., for purposes of authentication or
157 authorization). This document defines a framework enabling
158 application protocols to perform the preparation, enforcement, and
159
160
161
162 Saint-Andre & Blanchet Standards Track [Page 3]
163 RFC 8264 PRECIS Framework October 2017
164
165
166 comparison of internationalized strings ("PRECIS") in a way that
167 depends on the properties of Unicode code points and thus is more
168 agile with respect to versions of Unicode. (Note: PRECIS is
169 restricted to Unicode and does not support any other coded character
170 set [RFC6365].)
171
172 As described in the PRECIS problem statement [RFC6885], many IETF
173 protocols have used the Stringprep framework [RFC3454] as the basis
174 for preparing, enforcing, and comparing protocol strings that contain
175 Unicode code points, especially code points outside the ASCII range
176 [RFC20]. The Stringprep framework was developed during work on the
177 original technology for internationalized domain names (IDNs), here
178 called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the
179 Stringprep profile for IDNs. At the time, Stringprep was designed as
180 a general framework so that other application protocols could define
181 their own Stringprep profiles. Indeed, a number of application
182 protocols defined such profiles.
183
184 After the publication of [RFC3454] in 2002, several significant
185 issues arose with the use of Stringprep in the IDN case, as
186 documented in the IAB's recommendations regarding IDNs [RFC4690]
187 (most significantly, Stringprep was tied to Unicode version 3.2).
188 Therefore, the newer IDNA specifications, here called "IDNA2008"
189 [RFC5890] [RFC5891] [RFC5892] [RFC5893] [RFC5894], no longer use
190 Stringprep and Nameprep. This migration away from Stringprep for
191 IDNs prompted other "customers" of Stringprep to consider new
192 approaches to the preparation, enforcement, and comparison of
193 internationalized strings, as described in [RFC6885].
194
195 This document defines a framework for a post-Stringprep approach to
196 the preparation, enforcement, and comparison of internationalized
197 strings in application protocols, based on several principles:
198
199 1. Define a small set of string classes that specify the Unicode
200 code points appropriate for common application-protocol
201 constructs (where possible, maintaining compatibility with
202 IDNA2008 to help ensure a more consistent user experience).
203
204 2. Define each PRECIS string class in terms of Unicode code points
205 and their properties so that an algorithm can be used to
206 determine whether each code point or character category is
207 (a) valid, (b) allowed in certain contexts, (c) disallowed, or
208 (d) unassigned.
209
210 3. Use an "inclusion model" such that a string class consists only
211 of code points that are explicitly allowed, with the result that
212 any code point not explicitly allowed is forbidden.
213
214
215
216
217 Saint-Andre & Blanchet Standards Track [Page 4]
218 RFC 8264 PRECIS Framework October 2017
219
220
221 4. Enable application protocols to define profiles of the PRECIS
222 string classes if necessary (addressing matters such as width
223 mapping, case mapping, Unicode normalization, and
224 directionality), but strongly discourage the multiplication of
225 profiles beyond necessity in order to avoid violations of the
226 "Principle of Least Astonishment".
227
228 It is expected that this framework will yield the following benefits:
229
230 o Application protocols will be more agile with regard to Unicode
231 versions (recognizing that complete agility cannot be realized in
232 practice).
233
234 o Implementers will be able to share code point tables and software
235 code across application protocols, most likely by means of
236 software libraries.
237
238 o End users will be able to acquire more accurate expectations about
239 the code points that are acceptable in various contexts. Given
240 this more uniform set of string classes, it is also expected that
241 copy/paste operations between software implementing different
242 application protocols will be more predictable and coherent.
243
244 Whereas the string classes define the "baseline" code points for a
245 range of applications, profiling enables application protocols to
246 apply the string classes in ways that are appropriate for common
247 constructs such as usernames [RFC8265], opaque strings such as
248 passwords [RFC8265], and nicknames [RFC8266]. Profiles are
249 responsible for defining the handling of right-to-left code points as
250 well as various mapping operations of the kind also discussed for
251 IDNs in [RFC5895], such as case preservation or lowercasing, Unicode
252 normalization, mapping of certain code points to other code points or
253 to nothing, and mapping of fullwidth and halfwidth code points.
254
255 When an application applies a profile of a PRECIS string class, it
256 transforms an input string (which might or might not be conforming)
257 into an output string that definitively conforms to the profile. In
258 particular, this document focuses on the resulting ability to achieve
259 the following objectives:
260
261 a. Enforcing all the rules of a profile for a single output string
262 to check whether the output string conforms to the rules of the
263 profile and thus determine if a string can be included in a
264 protocol slot, communicated to another entity within a protocol,
265 stored in a retrieval system, etc.
266
267 b. Comparing two output strings to determine if they are equivalent,
268 typically through octet-for-octet matching to test for
269
270
271
272 Saint-Andre & Blanchet Standards Track [Page 5]
273 RFC 8264 PRECIS Framework October 2017
274
275
276 "bit-string identity" (e.g., to make an access decision for
277 purposes of authentication or authorization as further described
278 in [RFC6943]).
279
280 The opportunity to define profiles naturally introduces the
281 possibility of a proliferation of profiles, thus potentially
282 mitigating the benefits of common code and violating user
283 expectations. See Section 5 for a discussion of this important
284 topic.
285
286 In addition, it is extremely important for protocol designers and
287 application developers to understand that the transformation of an
288 input string to an output string is rarely reversible. As one
289 relatively simple example, case mapping would transform an input
290 string of "StPeter" to an output string of "stpeter", thus leading to
291 a loss of information about the capitalization of the first and third
292 characters. Similar considerations apply to other forms of mapping
293 and normalization.
294
295 Although this framework is similar to IDNA2008 and includes by
296 reference some of the character categories defined in [RFC5892], it
297 defines additional character categories to meet the needs of common
298 application protocols other than DNS.
299
300 The character categories and calculation rules defined under
301 Sections 8 and 9 are normative and apply to all Unicode code points.
302 The code point table that results from applying the character
303 categories and calculation rules to the latest version of Unicode can
304 be found in an IANA registry (see Section 11).
305
306 2. Terminology
307
308 Many important terms used in this document are defined in [RFC5890],
309 [RFC6365], [RFC6885], and [Unicode]. The terms "left-to-right" (LTR)
310 and "right-to-left" (RTL) are defined in Unicode Standard Annex #9
311 [UAX9].
312
313 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
314 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
315 "OPTIONAL" in this document are to be interpreted as described in
316 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
317 capitals, as shown here.
318
319 3. Preparation, Enforcement, and Comparison
320
321 This document distinguishes between three different actions that an
322 entity can take with regard to a string:
323
324
325
326
327 Saint-Andre & Blanchet Standards Track [Page 6]
328 RFC 8264 PRECIS Framework October 2017
329
330
331 o Enforcement entails applying all of the rules specified for a
332 particular string class, or profile thereof, to a single input
333 string, for the purpose of checking whether the string conforms to
334 all of the rules and thus determining if the string can be used in
335 a given protocol slot.
336
337 o Comparison entails applying all of the rules specified for a
338 particular string class, or profile thereof, to two separate input
339 strings, for the purpose of determining if the two strings are
340 equivalent.
341
342 o Preparation primarily entails ensuring that the code points in a
343 single input string are allowed by the underlying PRECIS string
344 class, and sometimes also entails applying one or more of the
345 rules specified for a particular string class or profile thereof.
346 Preparation can be appropriate for constrained devices that can to
347 some extent restrict the code points in a string to a limited
348 repertoire of characters but that do not have the processing power
349 or onboard memory to perform operations such as Unicode
350 normalization. However, preparation does not ensure that an input
351 string conforms to all of the rules for a string class or profile
352 thereof.
353
354 Note: The term "preparation" as used in this specification and
355 related documents has a much more limited scope than it did in
356 Stringprep; it essentially refers to a kind of preprocessing of
357 an input string, not the actual operations that apply
358 internationalization rules to produce an output string (here
359 termed "enforcement") or to compare two output strings (here
360 termed "comparison").
361
362 In most cases, authoritative entities such as servers are responsible
363 for enforcement, whereas subsidiary entities such as clients are
364 responsible only for preparation. The rationale for this distinction
365 is that clients might not have the facilities (in terms of device
366 memory and processing power) to enforce all the rules regarding
367 internationalized strings (such as width mapping and Unicode
368 normalization), although they can more easily limit the repertoire of
369 characters they offer to an end user. By contrast, it is assumed
370 that a server would have more capacity to enforce the rules, and in
371 any case a server acts as an authority regarding allowable strings in
372 protocol slots such as addresses and endpoint identifiers. In
373 addition, a client cannot necessarily be trusted to properly generate
374 such strings, especially for security-sensitive contexts such as
375 authentication and authorization.
376
377
378
379
380
381
382 Saint-Andre & Blanchet Standards Track [Page 7]
383 RFC 8264 PRECIS Framework October 2017
384
385
386 4. String Classes
387
388 4.1. Overview
389
390 Starting in 2010, various "customers" of Stringprep began to discuss
391 the need to define a post-Stringprep approach to the preparation and
392 comparison of internationalized strings other than IDNs. This
393 community analyzed the existing Stringprep profiles and also weighed
394 the costs and benefits of defining a relatively small set of Unicode
395 code points that would minimize the potential for user confusion
396 caused by visually similar code points (and thus be relatively
397 "safe") vs. defining a much larger set of Unicode code points that
398 would maximize the potential for user creativity (and thus be
399 relatively "expressive"). As a result, the community concluded that
400 most existing uses could be addressed by two string classes:
401
402 IdentifierClass: a sequence of letters, numbers, and some symbols
403 that is used to identify or address a network entity such as a
404 user account, a venue (e.g., a chat room), an information source
405 (e.g., a data feed), or a collection of data (e.g., a file); the
406 intent is that this class will minimize user confusion in a wide
407 variety of application protocols, with the result that safety has
408 been prioritized over expressiveness for this class.
409
410 FreeformClass: a sequence of letters, numbers, symbols, spaces, and
411 other code points that is used for free-form strings, including
412 passwords as well as display elements such as human-friendly
413 nicknames for devices or for participants in a chat room; the
414 intent is that this class will allow nearly any Unicode code
415 point, with the result that expressiveness has been prioritized
416 over safety for this class. Note well that protocol designers,
417 application developers, service providers, and end users might not
418 understand or be able to enter all of the code points that can be
419 included in the FreeformClass (see Section 12.3 for details).
420
421 Future specifications might define additional PRECIS string classes,
422 such as a class that falls somewhere between the IdentifierClass and
423 the FreeformClass. At this time, it is not clear how useful such a
424 class would be. In any case, because application developers are able
425 to define profiles of PRECIS string classes, a protocol needing a
426 construct between the IdentifierClass and the FreeformClass could
427 define a restricted profile of the FreeformClass if needed.
428
429 The following subsections discuss the IdentifierClass and
430 FreeformClass in more detail, with reference to the dimensions
431 described in Section 5 of [RFC6885]. Each string class is defined by
432 the following behavioral rules:
433
434
435
436
437 Saint-Andre & Blanchet Standards Track [Page 8]
438 RFC 8264 PRECIS Framework October 2017
439
440
441 Valid: Defines which code points are treated as valid for the
442 string.
443
444 Contextual Rule Required: Defines which code points are treated as
445 allowed only if the requirements of a contextual rule are met
446 (i.e., either CONTEXTJ or CONTEXTO as originally defined in the
447 IDNA2008 specifications).
448
449 Disallowed: Defines which code points need to be excluded from the
450 string.
451
452 Unassigned: Defines application behavior in the presence of code
453 points that are unknown (i.e., not yet designated) for the version
454 of Unicode used by the application.
455
456 This document defines the valid, contextual rule required,
457 disallowed, and unassigned rules for the IdentifierClass and
458 FreeformClass. As described under Section 5, profiles of these
459 string classes are responsible for defining the width mapping,
460 additional mapping, case mapping, normalization, and directionality
461 rules.
462
463 4.2. IdentifierClass
464
465 Most application technologies need strings that can be used to refer
466 to, include, or communicate protocol strings like usernames,
467 filenames, data feed identifiers, and chat room names. We group such
468 strings into a class called "IdentifierClass" having the following
469 features.
470
471 4.2.1. Valid
472
473 o Code points traditionally used as letters and numbers in writing
474 systems, i.e., the LetterDigits ("A") category first defined in
475 [RFC5892] and listed here under Section 9.1.
476
477 o Code points in the range U+0021 through U+007E, i.e., the
478 (printable) ASCII7 ("K") category defined under Section 9.11.
479 These code points are "grandfathered" into PRECIS and thus are
480 valid even if they would otherwise be disallowed according to the
481 property-based rules specified in the next section.
482
483 Note: Although the PRECIS IdentifierClass reuses the LetterDigits
484 category from IDNA2008, the range of code points allowed in the
485 IdentifierClass is wider than the range of code points allowed in
486 IDNA2008. The main reason is that IDNA2008 applies the
487 Unstable ("B") category (Section 9.2) before the LetterDigits
488
489
490
491
492 Saint-Andre & Blanchet Standards Track [Page 9]
493 RFC 8264 PRECIS Framework October 2017
494
495
496 category, thus disallowing uppercase code points, whereas the
497 IdentifierClass does not apply the Unstable category.
498
499 4.2.2. Contextual Rule Required
500
501 o A number of code points from the Exceptions ("F") category defined
502 under Section 9.6.
503
504 o Joining code points, i.e., the JoinControl ("H") category defined
505 under Section 9.8.
506
507 4.2.3. Disallowed
508
509 o Old Hangul Jamo code points, i.e., the OldHangulJamo ("I")
510 category defined under Section 9.9.
511
512 o Control code points, i.e., the Controls ("L") category defined
513 under Section 9.12.
514
515 o Ignorable code points, i.e., the PrecisIgnorableProperties ("M")
516 category defined under Section 9.13.
517
518 o Space code points, i.e., the Spaces ("N") category defined under
519 Section 9.14.
520
521 o Symbol code points, i.e., the Symbols ("O") category defined under
522 Section 9.15.
523
524 o Punctuation code points, i.e., the Punctuation ("P") category
525 defined under Section 9.16.
526
527 o Any code point that is decomposed and recomposed into something
528 other than itself under Unicode Normalization Form KC, i.e., the
529 HasCompat ("Q") category defined under Section 9.17. These code
530 points are disallowed even if they would otherwise be valid
531 according to the property-based rules specified in the previous
532 section.
533
534 o Letters and digits other than the "traditional" letters and digits
535 allowed in IDNs, i.e., the OtherLetterDigits ("R") category
536 defined under Section 9.18.
537
538 4.2.4. Unassigned
539
540 Any code points that are not yet designated in the Unicode coded
541 character set are considered unassigned for purposes of the
542 IdentifierClass, and such code points are to be treated as
543 disallowed. See Section 9.10.
544
545
546
547 Saint-Andre & Blanchet Standards Track [Page 10]
548 RFC 8264 PRECIS Framework October 2017
549
550
551 4.2.5. Examples
552
553 As described in the Introduction to this document, the string classes
554 do not handle all issues related to string preparation and comparison
555 (such as case mapping); instead, such issues are handled at the level
556 of profiles. Examples for profiles of the IdentifierClass can be
557 found in [RFC8265] (the UsernameCaseMapped and UsernameCasePreserved
558 profiles).
559
560 4.3. FreeformClass
561
562 Some application technologies need strings that can be used in a
563 free-form way, e.g., as a password in an authentication exchange (see
564 [RFC8265]) or a nickname in a chat room (see [RFC8266]). We group
565 such things into a class called "FreeformClass" having the following
566 features.
567
568 Security Warning: As mentioned, the FreeformClass prioritizes
569 expressiveness over safety; Section 12.3 describes some of the
570 security hazards involved with using or profiling the
571 FreeformClass.
572
573 Security Warning: Consult Section 12.6 for relevant security
574 considerations when strings conforming to the FreeformClass, or a
575 profile thereof, are used as passwords.
576
577 4.3.1. Valid
578
579 o Traditional letters and numbers, i.e., the LetterDigits ("A")
580 category first defined in [RFC5892] and listed here under
581 Section 9.1.
582
583 o Code points in the range U+0021 through U+007E, i.e., the
584 (printable) ASCII7 ("K") category defined under Section 9.11.
585
586 o Space code points, i.e., the Spaces ("N") category defined under
587 Section 9.14.
588
589 o Symbol code points, i.e., the Symbols ("O") category defined under
590 Section 9.15.
591
592 o Punctuation code points, i.e., the Punctuation ("P") category
593 defined under Section 9.16.
594
595 o Any code point that is decomposed and recomposed into something
596 other than itself under Unicode Normalization Form KC, i.e., the
597 HasCompat ("Q") category defined under Section 9.17.
598
599
600
601
602 Saint-Andre & Blanchet Standards Track [Page 11]
603 RFC 8264 PRECIS Framework October 2017
604
605
606 o Letters and digits other than the "traditional" letters and digits
607 allowed in IDNs, i.e., the OtherLetterDigits ("R") category
608 defined under Section 9.18.
609
610 4.3.2. Contextual Rule Required
611
612 o A number of code points from the Exceptions ("F") category defined
613 under Section 9.6.
614
615 o Joining code points, i.e., the JoinControl ("H") category defined
616 under Section 9.8.
617
618 4.3.3. Disallowed
619
620 o Old Hangul Jamo code points, i.e., the OldHangulJamo ("I")
621 category defined under Section 9.9.
622
623 o Control code points, i.e., the Controls ("L") category defined
624 under Section 9.12.
625
626 o Ignorable code points, i.e., the PrecisIgnorableProperties ("M")
627 category defined under Section 9.13.
628
629 4.3.4. Unassigned
630
631 Any code points that are not yet designated in the Unicode coded
632 character set are considered unassigned for purposes of the
633 FreeformClass, and such code points are to be treated as disallowed.
634
635 4.3.5. Examples
636
637 As described in the Introduction to this document, the string classes
638 do not handle all issues related to string preparation and comparison
639 (such as case mapping); instead, such issues are handled at the level
640 of profiles. Examples for profiles of the FreeformClass can be found
641 in [RFC8265] (the OpaqueString profile) and [RFC8266] (the Nickname
642 profile).
643
644 4.4. Summary
645
646 The following table summarizes the differences between the
647 IdentifierClass and the FreeformClass (i.e., the disposition of a
648 code point as valid, contextual rule required, disallowed, or
649 unassigned), depending on its PRECIS category.
650
651
652
653
654
655
656
657 Saint-Andre & Blanchet Standards Track [Page 12]
658 RFC 8264 PRECIS Framework October 2017
659
660
661 +===============================+=================+===============+
662 | CATEGORY | IDENTIFIERCLASS | FREEFORMCLASS |
663 +===============================+=================+===============+
664 | (A) LetterDigits | Valid | Valid |
665 +-------------------------------+-----------------+---------------+
666 | (B) Unstable | [N/A (unused)] |
667 +-------------------------------+-----------------+---------------+
668 | (C) IgnorableProperties | [N/A (unused)] |
669 +-------------------------------+-----------------+---------------+
670 | (D) IgnorableBlocks | [N/A (unused)] |
671 +-------------------------------+-----------------+---------------+
672 | (E) LDH | [N/A (unused)] |
673 +-------------------------------+-----------------+---------------+
674 | (F) Exceptions | Contextual | Contextual |
675 | | Rule Required | Rule Required |
676 +-------------------------------+-----------------+---------------+
677 | (G) BackwardCompatible | [Handled by IDNA Rules] |
678 +-------------------------------+-----------------+---------------+
679 | (H) JoinControl | Contextual | Contextual |
680 | | Rule Required | Rule Required |
681 +-------------------------------+-----------------+---------------+
682 | (I) OldHangulJamo | Disallowed | Disallowed |
683 +-------------------------------+-----------------+---------------+
684 | (J) Unassigned | Unassigned | Unassigned |
685 +-------------------------------+-----------------+---------------+
686 | (K) ASCII7 | Valid | Valid |
687 +-------------------------------+-----------------+---------------+
688 | (L) Controls | Disallowed | Disallowed |
689 +-------------------------------+-----------------+---------------+
690 | (M) PrecisIgnorableProperties | Disallowed | Disallowed |
691 +-------------------------------+-----------------+---------------+
692 | (N) Spaces | Disallowed | Valid |
693 +-------------------------------+-----------------+---------------+
694 | (O) Symbols | Disallowed | Valid |
695 +-------------------------------+-----------------+---------------+
696 | (P) Punctuation | Disallowed | Valid |
697 +-------------------------------+-----------------+---------------+
698 | (Q) HasCompat | Disallowed | Valid |
699 +-------------------------------+-----------------+---------------+
700 | (R) OtherLetterDigits | Disallowed | Valid |
701 +-------------------------------+-----------------+---------------+
702
703 Table 1: Comparative Disposition of Code Points
704
705
706
707
708
709
710
711
712 Saint-Andre & Blanchet Standards Track [Page 13]
713 RFC 8264 PRECIS Framework October 2017
714
715
716 5. Profiles
717
718 This framework document defines the valid, contextual rule required,
719 disallowed, and unassigned rules for the IdentifierClass and the
720 FreeformClass. A profile of a PRECIS string class MUST define the
721 width mapping, additional mapping (if any), case mapping,
722 normalization, and directionality rules. A profile MAY also restrict
723 the allowable code points above and beyond the definition of the
724 relevant PRECIS string class (but MUST NOT add as valid any code
725 points that are disallowed by the relevant PRECIS string class).
726 These matters are discussed in the following subsections.
727
728 Profiles of the PRECIS string classes are registered with the IANA as
729 described under Section 11.3. Profile names use the following
730 convention: they are of the form "Profilename of BaseClass", where
731 the "Profilename" string is a differentiator and "BaseClass" is the
732 name of the PRECIS string class being profiled; for example, the
733 profile used for opaque strings such as passwords is the OpaqueString
734 profile of the FreeformClass [RFC8265].
735
736 5.1. Profiles Must Not Be Multiplied beyond Necessity
737
738 The risk of profile proliferation is significant because having too
739 many profiles will result in different behavior across various
740 applications, thus violating what is known in user interface design
741 as the "Principle of Least Astonishment".
742
743 Indeed, we already have too many profiles. Ideally, we would have at
744 most two or three profiles. Unfortunately, numerous application
745 protocols exist with their own quirks regarding protocol strings.
746 Domain names, email addresses, instant messaging addresses, chat room
747 names, user nicknames or display names, filenames, authentication
748 identifiers, passwords, and other strings already exist in the wild
749 and need to be supported in existing application protocols such as
750 DNS, SMTP, the Extensible Messaging and Presence Protocol (XMPP),
751 Internet Relay Chat (IRC), NFS, the Internet Small Computer System
752 Interface (iSCSI), the Extensible Authentication Protocol (EAP), and
753 the Simple Authentication and Security Layer (SASL) [RFC4422], among
754 others.
755
756 Nevertheless, profiles must not be multiplied beyond necessity.
757
758 To help prevent profile proliferation, this document recommends
759 sensible defaults for the various options offered to profile creators
760 (such as width mapping and Unicode normalization). In addition, the
761 guidelines for designated experts provided under Section 10 are meant
762 to encourage a high level of due diligence regarding new profiles.
763
764
765
766
767 Saint-Andre & Blanchet Standards Track [Page 14]
768 RFC 8264 PRECIS Framework October 2017
769
770
771 5.2. Rules
772
773 5.2.1. Width Mapping Rule
774
775 The width mapping rule of a profile specifies whether width mapping
776 is performed on a string and how the mapping is done. Typically,
777 such mapping consists of mapping fullwidth and halfwidth code points,
778 i.e., code points with a Decomposition Type of Wide or Narrow, to
779 their decomposition mappings; as an example, "0" (FULLWIDTH DIGIT
780 ZERO, U+FF10) would be mapped to "0" (DIGIT ZERO U+0030).
781
782 The normalization form specified by a profile (see below) has an
783 impact on the need for width mapping. Because width mapping is
784 performed as a part of compatibility decomposition, a profile
785 employing either Normalization Form KD (NFKD) or Normalization
786 Form KC (NFKC) does not need to specify width mapping. However, if
787 Unicode Normalization Form C (NFC) is used (as is recommended), then
788 the profile needs to specify whether to apply width mapping; in this
789 case, width mapping is in general RECOMMENDED because allowing
790 fullwidth and halfwidth code points to remain unmapped to their
791 compatibility variants would violate the "Principle of Least
792 Astonishment". For more information about the concept of width in
793 East Asian scripts within Unicode, see Unicode Standard Annex #11
794 [UAX11].
795
796 Note: Because the East Asian width property is not guaranteed to
797 be stable by the Unicode Standard (see
798 <http://unicode.org/policies/stability_policy.html> for details),
799 the results of applying a given width mapping rule might not be
800 consistent across different versions of Unicode.
801
802 5.2.2. Additional Mapping Rule
803
804 The additional mapping rule of a profile specifies whether additional
805 mappings are performed on a string, such as:
806
807 o Mapping of delimiter code points (such as '@', ':', '/', '+',
808 and '-').
809
810 o Mapping of special code points (e.g., non-ASCII space code points
811 to SPACE (U+0020) or control code points to nothing).
812
813 The PRECIS mappings document [RFC7790] describes such mappings in
814 more detail.
815
816
817
818
819
820
821
822 Saint-Andre & Blanchet Standards Track [Page 15]
823 RFC 8264 PRECIS Framework October 2017
824
825
826 5.2.3. Case Mapping Rule
827
828 The case mapping rule of a profile specifies whether case mapping
829 (instead of case preservation) is performed on a string and how the
830 mapping is applied (e.g., mapping uppercase and titlecase code points
831 to their lowercase equivalents).
832
833 If case mapping is desired (instead of case preservation), it is
834 RECOMMENDED to use the Unicode toLowerCase() operation defined in the
835 Unicode Standard [Unicode]. In contrast to the Unicode toCaseFold()
836 operation, the toLowerCase() operation is less likely to violate the
837 "Principle of Least Astonishment", especially when an application
838 merely wishes to convert uppercase and titlecase code points to their
839 lowercase equivalents while preserving lowercase code points.
840 Although the toCaseFold() operation can be appropriate when an
841 application needs to compare two strings (such as in search
842 operations), in general few application developers and even fewer
843 users understand its implications, so toLowerCase() is almost always
844 the safer choice.
845
846 Note: Neither toLowerCase() nor toCaseFold() is designed to handle
847 various language-specific issues, such as the character "ı" (LATIN
848 SMALL LETTER DOTLESS I, U+0131) in several Turkic languages. The
849 reader is referred to the PRECIS mappings document [RFC7790],
850 which describes these issues in greater detail.
851
852 In order to maximize entropy and minimize the potential for false
853 accepts, it is NOT RECOMMENDED for application protocols to map
854 uppercase and titlecase code points to their lowercase equivalents
855 when strings conforming to the FreeformClass, or a profile thereof,
856 are used in passwords; instead, it is RECOMMENDED to preserve the
857 case of all code points contained in such strings and then perform
858 case-sensitive comparison. See also the related discussion in
859 Section 12.6 of this document and in [RFC8265].
860
861 5.2.4. Normalization Rule
862
863 The normalization rule of a profile specifies which Unicode
864 Normalization Form (D, KD, C, or KC) is to be applied (see Unicode
865 Standard Annex #15 [UAX15] for background information).
866
867 In accordance with [RFC5198], Normalization Form C (NFC) is
868 RECOMMENDED.
869
870 Protocol designers and application developers need to understand that
871 certain Unicode normalization forms, especially NFKC and NFKD, can
872 result in significant loss of information in various circumstances
873 and that these circumstances can depend on the language and script of
874
875
876
877 Saint-Andre & Blanchet Standards Track [Page 16]
878 RFC 8264 PRECIS Framework October 2017
879
880
881 the strings to which the normalization forms are applied. Extreme
882 care should be taken when specifying the use of these normalization
883 forms.
884
885 5.2.5. Directionality Rule
886
887 The directionality rule of a profile specifies how to treat strings
888 containing what are often called "right-to-left" (RTL) code points
889 (see Unicode Standard Annex #9 [UAX9]). RTL code points come from
890 scripts that are normally written from right to left and are
891 considered by Unicode to, themselves, have right-to-left
892 directionality. Some strings containing RTL code points also contain
893 "left-to-right" (LTR) code points, such as ASCII numerals, as well as
894 code points without directional properties. Consequently, such
895 strings are known as "bidirectional strings".
896
897 Presenting bidirectional strings in different layout systems (e.g., a
898 user interface that is configured to handle primarily an RTL script
899 vs. an interface that is configured to handle primarily an LTR
900 script) can yield display results that, while predictable to those
901 who understand the display rules, are counterintuitive to casual
902 users. In particular, the same bidirectional string (in PRECIS
903 terms) might not be presented in the same way to users of those
904 different layout systems, even though the presentation is consistent
905 within any particular layout system. In some applications, these
906 presentation differences might be considered problematic and thus the
907 application designers might wish to restrict the use of bidirectional
908 strings by specifying a directionality rule. In other applications,
909 these presentation differences might not be considered problematic
910 (this especially tends to be true of more "free-form" strings) and
911 thus no directionality rule is needed.
912
913 The PRECIS framework does not directly address how to deal with
914 bidirectional strings across all string classes and profiles nor does
915 it define any new directionality rules, because at present there is
916 no widely accepted and implemented solution for the safe display of
917 arbitrary bidirectional strings beyond the Unicode bidirectional
918 algorithm [UAX9]. Although rules for management and display of
919 bidirectional strings have been defined for domain name labels and
920 similar identifiers through the "Bidi Rule" specified in the IDNA2008
921 specification on right-to-left scripts [RFC5893], those rules are
922 quite restrictive and are not necessarily applicable to all
923 bidirectional strings.
924
925 The authors of a PRECIS profile might believe that they need to
926 define a new directionality rule of their own. Because of the
927 complexity of the issues involved, such a belief is almost always
928 misguided, even if the authors have done a great deal of careful
929
930
931
932 Saint-Andre & Blanchet Standards Track [Page 17]
933 RFC 8264 PRECIS Framework October 2017
934
935
936 research into the challenges of displaying bidirectional strings.
937 This document strongly suggests that profile authors who are thinking
938 about defining a new directionality rule should think again and
939 instead consider using the "Bidi Rule" [RFC5893] (for profiles based
940 on the IdentifierClass) or following the Unicode bidirectional
941 algorithm [UAX9] (for profiles based on the FreeformClass or in
942 situations where the IdentifierClass is not appropriate).
943
944 5.3. A Note about Spaces
945
946 With regard to the IdentifierClass, the consensus of the PRECIS
947 Working Group was that spaces are problematic for many reasons,
948 including the following:
949
950 o Many Unicode code points are confusable with SPACE (U+0020).
951
952 o Even if non-ASCII space code points are mapped to SPACE (U+0020),
953 space code points are often not rendered in user interfaces,
954 leading to the possibility that a human user might consider a
955 string containing spaces to be equivalent to the same string
956 without spaces.
957
958 o In some locales, some devices are known to generate a code point
959 other than SPACE (U+0020), such as ZERO WIDTH JOINER (U+200D),
960 when a user performs an action like pressing the space bar on a
961 keyboard.
962
963 One consequence of disallowing space code points in the
964 IdentifierClass might be to effectively discourage their use within
965 identifiers created in newer application protocols; given the
966 challenges involved with properly handling space code points
967 (especially non-ASCII space code points) in identifiers and other
968 protocol strings, the PRECIS Working Group considered this to be a
969 feature, not a bug.
970
971 However, the FreeformClass does allow spaces; this in turn enables
972 application protocols to define profiles of the FreeformClass that
973 are more flexible than any profiles of the IdentifierClass. In
974 addition, as explained in Section 6.3, application protocols can also
975 define application-layer constructs containing spaces.
976
977 6. Applications
978
979 6.1. How to Use PRECIS in Applications
980
981 Although PRECIS has been designed with applications in mind,
982 internationalization is not suddenly made easy through the use of
983 PRECIS. Indeed, because it is extremely difficult for protocol
984
985
986
987 Saint-Andre & Blanchet Standards Track [Page 18]
988 RFC 8264 PRECIS Framework October 2017
989
990
991 designers and application developers to do the right thing for all
992 users when supporting internationalized strings, often the safest
993 option is to support only the ASCII range [RFC20] in various protocol
994 slots. This state of affairs is unfortunate but is the direct result
995 of the complexities involved with human languages (e.g., the vast
996 number of code points, scripts, user communities, and rules with
997 their inevitable exceptions), which kinds of strings application
998 developers and their users wish to support, the wide range of devices
999 that users employ to access services enabled by various Internet
1000 protocols, and so on.
1001
1002 Despite these significant challenges, application and protocol
1003 developers sometimes persevere in attempting to support
1004 internationalized strings in their systems. These developers need to
1005 think carefully about how they will use the PRECIS string classes, or
1006 profiles thereof, in their applications. This section provides some
1007 guidelines to application developers (and to expert reviewers of
1008 application-protocol specifications).
1009
1010 o Don't define your own profile unless absolutely necessary (see
1011 Section 5.1). Existing profiles have been designed for wide
1012 reuse. It is highly likely that an existing profile will meet
1013 your needs, especially given the ability to specify further
1014 excluded code points (Section 6.2) and to build application-layer
1015 constructs (see Section 6.3).
1016
1017 o Do specify:
1018
1019 * Exactly which entities are responsible for preparation,
1020 enforcement, and comparison of internationalized strings (e.g.,
1021 servers or clients).
1022
1023 * Exactly when those entities need to complete their tasks (e.g.,
1024 a server might need to enforce the rules of a profile before
1025 allowing a client to gain network access).
1026
1027 * Exactly which protocol slots need to be checked against which
1028 profiles (e.g., checking the address of a message's intended
1029 recipient against the UsernameCaseMapped profile [RFC8265] of
1030 the IdentifierClass or checking the password of a user against
1031 the OpaqueString profile [RFC8265] of the FreeformClass).
1032
1033 See [RFC8265] and [RFC7622] for definitions of these matters for
1034 several applications.
1035
1036
1037
1038
1039
1040
1041
1042 Saint-Andre & Blanchet Standards Track [Page 19]
1043 RFC 8264 PRECIS Framework October 2017
1044
1045
1046 6.2. Further Excluded Characters
1047
1048 An application protocol that uses a profile MAY specify particular
1049 code points that are not allowed in relevant slots within that
1050 application protocol, above and beyond those excluded by the string
1051 class or profile.
1052
1053 That is, an application protocol MAY do either of the following:
1054
1055 1. Exclude specific code points that are allowed by the relevant
1056 string class.
1057
1058 2. Exclude code points matching certain Unicode properties (e.g.,
1059 math symbols) that are included in the relevant PRECIS string
1060 class.
1061
1062 As a result of such exclusions, code points that are defined as valid
1063 for the PRECIS string class or profile will be defined as disallowed
1064 for the relevant protocol slot.
1065
1066 Typically, such exclusions are defined for the purpose of backward
1067 compatibility with legacy formats within an application protocol.
1068 These are defined for application protocols, not profiles, in order
1069 to prevent multiplication of profiles beyond necessity (see
1070 Section 5.1).
1071
1072 6.3. Building Application-Layer Constructs
1073
1074 Sometimes, an application-layer construct does not map in a
1075 straightforward manner to one of the PRECIS string classes or a
1076 profile thereof. Consider, for example, the "simple username"
1077 construct in SASL [RFC4422]. Depending on the deployment, a simple
1078 username might take the form of a user's full name (e.g., the user's
1079 personal name followed by a space and then the user's family name).
1080 Such a simple username cannot be defined as an instance of the
1081 IdentifierClass or a profile thereof, because space code points are
1082 not allowed in the IdentifierClass; however, it could be defined
1083 using a space-separated sequence of IdentifierClass instances, as in
1084 the following ABNF [RFC5234] from [RFC8265]:
1085
1086 username = userpart *(1*SP userpart)
1087 userpart = 1*(idpoint)
1088 ;
1089 ; an "idpoint" is a Unicode code point that
1090 ; can be contained in a string conforming to
1091 ; the PRECIS IdentifierClass
1092 ;
1093
1094
1095
1096
1097 Saint-Andre & Blanchet Standards Track [Page 20]
1098 RFC 8264 PRECIS Framework October 2017
1099
1100
1101 Similar techniques could be used to define many application-layer
1102 constructs, say of the form "user@domain" or "/path/to/file".
1103
1104 7. Order of Operations
1105
1106 To ensure proper comparison, the rules specified for a particular
1107 string class or profile MUST be applied in the following order:
1108
1109 1. Width Mapping Rule
1110
1111 2. Additional Mapping Rule
1112
1113 3. Case Mapping Rule
1114
1115 4. Normalization Rule
1116
1117 5. Directionality Rule
1118
1119 6. Behavioral rules for determining whether a code point is valid,
1120 allowed under a contextual rule, disallowed, or unassigned
1121
1122 As already described, the width mapping, additional mapping, case
1123 mapping, normalization, and directionality rules are specified for
1124 each profile, whereas the behavioral rules are specified for each
1125 string class. Some of the logic behind this order is provided under
1126 Section 5.2.1 (see also the PRECIS mappings document [RFC7790]). In
1127 addition, this order is consistent with IDNA2008, and with both
1128 IDNA2003 and Stringprep before then, for the purpose of enabling code
1129 reuse and of ensuring as much continuity as possible with the
1130 Stringprep profiles that are obsoleted by several PRECIS profiles.
1131
1132 Because of the order of operations specified here, applying the rules
1133 for any given PRECIS profile is not necessarily an idempotent
1134 procedure (e.g., under certain circumstances, such as when Unicode
1135 Normalization Form KC is used, performing Unicode normalization after
1136 case mapping can still yield uppercase characters for certain code
1137 points). Therefore, an implementation SHOULD apply the rules
1138 repeatedly until the output string is stable; if the output string
1139 does not stabilize after reapplying the rules three (3) additional
1140 times after the first application, the implementation SHOULD
1141 terminate application of the rules and reject the input string as
1142 invalid.
1143
1144 8. Code Point Properties
1145
1146 In order to implement the string classes described above, this
1147 document does the following:
1148
1149
1150
1151
1152 Saint-Andre & Blanchet Standards Track [Page 21]
1153 RFC 8264 PRECIS Framework October 2017
1154
1155
1156 1. Reviews and classifies the collections of code points in the
1157 Unicode coded character set by examining various code point
1158 properties.
1159
1160 2. Defines an algorithm for determining a derived property value,
1161 which can depend on the string class being used by the relevant
1162 application protocol.
1163
1164 This document is not intended to specify precisely how derived
1165 property values are to be applied in protocol strings. That
1166 information is the responsibility of the protocol specification that
1167 uses or profiles a PRECIS string class from this document. The value
1168 of the property is to be interpreted as follows.
1169
1170 PROTOCOL VALID Those code points that are allowed to be used in any
1171 PRECIS string class (currently, IdentifierClass and
1172 FreeformClass). The abbreviated term "PVALID" is used to refer to
1173 this value in the remainder of this document.
1174
1175 SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to
1176 be used in specific string classes. In the remainder of this
1177 document, the abbreviated term *_PVAL is used, where * = (ID |
1178 FREE), i.e., either "FREE_PVAL" for the FreeformClass or "ID_PVAL"
1179 for the IdentifierClass. In practice, the derived property
1180 ID_PVAL is not used in this specification, because every ID_PVAL
1181 code point is PVALID.
1182
1183 CONTEXTUAL RULE REQUIRED Some characteristics of the code point,
1184 such as its being invisible in certain contexts or problematic in
1185 others, require that it not be used in a string unless specific
1186 other code points or properties are present in the string. As in
1187 IDNA2008, there are two subdivisions of CONTEXTUAL RULE REQUIRED:
1188 the first for Join_controls (called "CONTEXTJ") and the second for
1189 other code points (called "CONTEXTO"). A string MUST NOT contain
1190 any characters whose validity is context-dependent, unless the
1191 validity is positively confirmed by a contextual rule. To check
1192 this, each code point identified as CONTEXTJ or CONTEXTO in the
1193 "PRECIS Derived Property Value" registry (Section 11.1) MUST have
1194 a non-null rule. If such a code point is missing a rule, the
1195 string is invalid. If the rule exists but the result of applying
1196 the rule is negative or inconclusive, the proposed string is
1197 invalid. The most notable of the CONTEXTUAL RULE REQUIRED code
1198 points are the Join Control code points ZERO WIDTH JOINER (U+200D)
1199 and ZERO WIDTH NON-JOINER (U+200C), which have a derived property
1200 value of CONTEXTJ. See Appendix A of [RFC5892] for more
1201 information.
1202
1203
1204
1205
1206
1207 Saint-Andre & Blanchet Standards Track [Page 22]
1208 RFC 8264 PRECIS Framework October 2017
1209
1210
1211 DISALLOWED Those code points that are not permitted in any PRECIS
1212 string class.
1213
1214 SPECIFIC CLASS DISALLOWED Those code points that are not to be
1215 included in one of the string classes but that might be permitted
1216 in others. In the remainder of this document, the abbreviated
1217 term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS"
1218 for the FreeformClass or "ID_DIS" for the IdentifierClass. In
1219 practice, the derived property FREE_DIS is not used in this
1220 specification, because every FREE_DIS code point is DISALLOWED.
1221
1222 UNASSIGNED Those code points that are not designated (i.e., are
1223 unassigned) in the Unicode Standard.
1224
1225 The algorithm to calculate the value of the derived property is as
1226 follows (implementations MUST NOT modify the order of operations
1227 within this algorithm, because doing so would cause inconsistent
1228 results across implementations):
1229
1230 If .cp. .in. Exceptions Then Exceptions(cp);
1231 Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
1232 Else If .cp. .in. Unassigned Then UNASSIGNED;
1233 Else If .cp. .in. ASCII7 Then PVALID;
1234 Else If .cp. .in. JoinControl Then CONTEXTJ;
1235 Else If .cp. .in. OldHangulJamo Then DISALLOWED;
1236 Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
1237 Else If .cp. .in. Controls Then DISALLOWED;
1238 Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
1239 Else If .cp. .in. LetterDigits Then PVALID;
1240 Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
1241 Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
1242 Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
1243 Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
1244 Else DISALLOWED;
1245
1246 The value of the derived property calculated can depend on the string
1247 class; for example, if an identifier used in an application protocol
1248 is defined as profiling the PRECIS IdentifierClass then a space
1249 character such as SPACE (U+0020) would be assigned to ID_DIS, whereas
1250 if an identifier is defined as profiling the PRECIS FreeformClass
1251 then the character would be assigned to FREE_PVAL. For the sake of
1252 brevity, the designation "FREE_PVAL" is used herein, instead of the
1253 longer designation "ID_DIS or FREE_PVAL". In practice, the derived
1254 properties ID_PVAL and FREE_DIS are not used in this specification,
1255 because every ID_PVAL code point is PVALID and every FREE_DIS code
1256 point is DISALLOWED.
1257
1258
1259
1260
1261
1262 Saint-Andre & Blanchet Standards Track [Page 23]
1263 RFC 8264 PRECIS Framework October 2017
1264
1265
1266 Use of the name of a rule (such as "Exceptions") implies the set of
1267 code points that the rule defines, whereas the same name as a
1268 function call (such as "Exceptions(cp)") implies the value that the
1269 code point has in the Exceptions table.
1270
1271 The mechanisms described here allow determination of the value of the
1272 property for future versions of Unicode (including code points added
1273 after Unicode 5.2 or 7.0, depending on the category, because some
1274 categories mentioned in this document are simply pointers to IDNA2008
1275 and therefore were defined at the time of Unicode 5.2). Changes in
1276 Unicode properties that do not affect the outcome of this process
1277 therefore do not affect this framework. For example, a code point
1278 can have its Unicode General_Category value change from So to Sm, or
1279 from Lo to Ll, without affecting the algorithm results. Moreover,
1280 even if such changes were to result, the BackwardCompatible list
1281 (Section 9.7) can be adjusted to ensure the stability of the results.
1282
1283 9. Category Definitions Used to Calculate Derived Property
1284
1285 The derived property obtains its value based on a two-step procedure:
1286
1287 1. Code points are placed in one or more character categories either
1288 (1) based on core properties defined by the Unicode Standard or
1289 (2) by treating the code point as an exception and addressing the
1290 code point based on its code point value. These categories are
1291 not mutually exclusive.
1292
1293 2. Set operations are used with these categories to determine the
1294 values for a property specific to a given string class. These
1295 operations are specified under Section 8.
1296
1297 Note: Unicode property names and property value names might have
1298 short abbreviations, such as "gc" for the General_Category
1299 property and "Ll" for the Lowercase_Letter property value of the
1300 gc property.
1301
1302 In the following specification of character categories, the operation
1303 that returns the value of a particular Unicode code point property
1304 for a code point is designated by using the formal name of that
1305 property (from the Unicode PropertyAliases.txt file [PropertyAliases]
1306 followed by "(cp)" for "code point". For example, the value of the
1307 General_Category property for a code point is indicated by
1308 General_Category(cp).
1309
1310 The first ten categories (A-J) shown below were previously defined
1311 for IDNA2008 and are referenced from [RFC5892] to ease the
1312 understanding of how PRECIS handles various code points. Some of
1313 these categories are reused in PRECIS, and some of them are not;
1314
1315
1316
1317 Saint-Andre & Blanchet Standards Track [Page 24]
1318 RFC 8264 PRECIS Framework October 2017
1319
1320
1321 however, the lettering of categories is retained to prevent overlap
1322 and to ease implementation of both IDNA2008 and PRECIS in a single
1323 software application. The next eight categories (K-R) are specific
1324 to PRECIS.
1325
1326 9.1. LetterDigits (A)
1327
1328 This category is defined in Section 2.1 of [RFC5892] and is included
1329 by reference for use in PRECIS.
1330
1331 9.2. Unstable (B)
1332
1333 This category is defined in Section 2.2 of [RFC5892]. However, it is
1334 not used in PRECIS.
1335
1336 9.3. IgnorableProperties (C)
1337
1338 This category is defined in Section 2.3 of [RFC5892]. However, it is
1339 not used in PRECIS.
1340
1341 Note: See the PrecisIgnorableProperties ("M") category below for a
1342 more inclusive category used in PRECIS identifiers.
1343
1344 9.4. IgnorableBlocks (D)
1345
1346 This category is defined in Section 2.4 of [RFC5892]. However, it is
1347 not used in PRECIS.
1348
1349 9.5. LDH (E)
1350
1351 This category is defined in Section 2.5 of [RFC5892]. However, it is
1352 not used in PRECIS.
1353
1354 Note: See the ASCII7 ("K") category below for a more inclusive
1355 category used in PRECIS identifiers.
1356
1357 9.6. Exceptions (F)
1358
1359 This category is defined in Section 2.6 of [RFC5892] and is included
1360 by reference for use in PRECIS.
1361
1362 9.7. BackwardCompatible (G)
1363
1364 This category is defined in Section 2.7 of [RFC5892] and is included
1365 by reference for use in PRECIS.
1366
1367 Note: Management of this category is handled via the processes
1368 specified in [RFC5892]. At the time of this writing (and also at the
1369
1370
1371
1372 Saint-Andre & Blanchet Standards Track [Page 25]
1373 RFC 8264 PRECIS Framework October 2017
1374
1375
1376 time that RFC 5892 was published), this category consisted of the
1377 empty set; however, that is subject to change as described in
1378 RFC 5892.
1379
1380 9.8. JoinControl (H)
1381
1382 This category is defined in Section 2.8 of [RFC5892] and is included
1383 by reference for use in PRECIS.
1384
1385 Note: In particular, the code points ZERO WIDTH JOINER (U+200D) and
1386 ZERO WIDTH NON-JOINER (U+200C) are necessary to produce certain
1387 combinations of characters in certain scripts (e.g., Arabic, Persian,
1388 and Indic scripts), but if used in other contexts, they can have
1389 consequences that violate the "Principle of Least Astonishment".
1390 Therefore, these code points are allowed only in contexts where they
1391 are appropriate, specifically where the relevant rule (CONTEXTJ or
1392 CONTEXTO) has been defined. See [RFC5892] and [RFC5894] for further
1393 discussion.
1394
1395 9.9. OldHangulJamo (I)
1396
1397 This category is defined in Section 2.9 of [RFC5892] and is included
1398 by reference for use in PRECIS.
1399
1400 Note: Exclusion of these code points results in disallowing certain
1401 archaic Korean syllables and in restricting supported Korean
1402 syllables to preformed, modern Hangul characters.
1403
1404 9.10. Unassigned (J)
1405
1406 This category is defined in Section 2.10 of [RFC5892] and is included
1407 by reference for use in PRECIS.
1408
1409 9.11. ASCII7 (K)
1410
1411 This PRECIS-specific category consists of all printable, non-space
1412 code points from the 7-bit ASCII range. By applying this category,
1413 the algorithm specified under Section 8 exempts these code points
1414 from other rules that might be applied during PRECIS processing, on
1415 the assumption that these code points are in such wide use that
1416 disallowing them would be counterproductive.
1417
1418 K: cp is in {0021..007E}
1419
1420
1421
1422
1423
1424
1425
1426
1427 Saint-Andre & Blanchet Standards Track [Page 26]
1428 RFC 8264 PRECIS Framework October 2017
1429
1430
1431 9.12. Controls (L)
1432
1433 This PRECIS-specific category consists of all control code points,
1434 such as LINE FEED (U+000A).
1435
The IETF is responsible for the creation and maintenance of the DNS RFCs. The ICANN DNS RFC annotation project provides a forum for collecting community annotations on these RFCs as an aid to understanding for implementers and any interested parties. The annotations displayed here are not the result of the IETF consensus process.
This RFC is included in the DNS RFCs annotation project whose home page is here.
1436 L: Control(cp) = True
1437
1438 9.13. PrecisIgnorableProperties (M)
1439
1440 This PRECIS-specific category is used to group code points that are
1441 discouraged from use in PRECIS string classes.
1442
1443 M: Default_Ignorable_Code_Point(cp) = True or
1444 Noncharacter_Code_Point(cp) = True
1445
1446 The definition for Default_Ignorable_Code_Point can be found in the
1447 DerivedCoreProperties.txt file [DerivedCoreProperties].
1448
1449 Note: In general, these code points are constructs such as so-called
1450 "soft hyphens", certain joining code points, various specialized code
1451 points for use within Unicode itself (e.g., language tags and
1452 variation selectors), and so on. Disallowing these code points in
1453 PRECIS reduces the potential for unexpected results in the use of
1454 internationalized strings.
1455
1456 9.14. Spaces (N)
1457
1458 This PRECIS-specific category is used to group code points that are
1459 spaces.
1460
1461 N: General_Category(cp) is in {Zs}
1462
1463 9.15. Symbols (O)
1464
1465 This PRECIS-specific category is used to group code points that are
1466 symbols.
1467
1468 O: General_Category(cp) is in {Sm, Sc, Sk, So}
1469
1470 9.16. Punctuation (P)
1471
1472 This PRECIS-specific category is used to group code points that are
1473 punctuation.
1474
1475 P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}
1476
1477
1478
1479
1480
1481
1482 Saint-Andre & Blanchet Standards Track [Page 27]
1483 RFC 8264 PRECIS Framework October 2017
1484
1485
1486 9.17. HasCompat (Q)
1487
1488 This PRECIS-specific category is used to group any code point that is
1489 decomposed and recomposed into something other than itself under
1490 Unicode Normalization Form KC.
1491
1492 Q: toNFKC(cp) != cp
1493
1494 Typically, this category is true of code points that are
1495 "compatibility decomposable characters" as defined in the Unicode
1496 Standard.
1497
1498 The toNFKC() operation returns the code point in Normalization
1499 Form KC. For more information, see Unicode Standard Annex #15
1500 [UAX15].
1501
1502 9.18. OtherLetterDigits (R)
1503
1504 This PRECIS-specific category is used to group code points that are
1505 letters and digits other than the "traditional" letters and digits
1506 grouped under the LetterDigits ("A") category (see Section 9.1).
1507
1508 R: General_Category(cp) is in {Lt, Nl, No, Me}
1509
1510 10. Guidelines for Designated Experts
1511
1512 Experience with internationalization in application protocols has
1513 shown that protocol designers and application developers usually do
1514 not understand the subtleties and trade-offs involved with
1515 internationalization and that they need considerable guidance in
1516 making reasonable decisions with regard to the options before them.
1517
1518 Therefore:
1519
1520 o Protocol designers are strongly encouraged to question the
1521 assumption that they need to define new profiles, because existing
1522 profiles are designed for wide reuse (see Section 5 for further
1523 discussion).
1524
1525 o Those who persist in defining new profiles are strongly encouraged
1526 to clearly explain a strong justification for doing so and to
1527 publish a stable specification that provides all of the
1528 information described under Section 11.3.
1529
1530 o The designated experts for profile registration requests ought to
1531 seek answers to all of the questions provided under Section 11.3
1532 and ought to encourage applicants to provide a stable
1533 specification documenting the profile (even though the
1534
1535
1536
1537 Saint-Andre & Blanchet Standards Track [Page 28]
1538 RFC 8264 PRECIS Framework October 2017
1539
1540
1541 registration policy for PRECIS profiles is "Expert Review" and a
1542 stable specification is not strictly required).
1543
1544 o Developers of applications that use PRECIS are strongly encouraged
1545 to apply the guidelines provided under Section 6 and to seek out
1546 the advice of the designated experts or other knowledgeable
1547 individuals in doing so.
1548
1549 o All parties are strongly encouraged to help prevent the
1550 multiplication of profiles beyond necessity, as described under
1551 Section 5.1, and to use PRECIS in ways that will minimize user
1552 confusion and insecure application behavior.
1553
1554 Internationalization can be difficult and contentious; designated
1555 experts, profile registrants, and application developers are strongly
1556 encouraged to work together in a spirit of good faith and mutual
1557 understanding to achieve rough consensus on profile registration
1558 requests and the use of PRECIS in particular applications. They are
1559 also encouraged to bring additional expertise into the discussion if
1560 that would be helpful in adding perspective or otherwise resolving
1561 issues.
1562
1563 11. IANA Considerations
1564
1565 11.1. PRECIS Derived Property Value Registry
1566
1567 IANA has created and now maintains the "PRECIS Derived Property
1568 Value" registry (<https://www.iana.org/assignments/precis-tables/>),
1569 which records the derived properties for each version of Unicode
1570 released starting from version 6.3. The derived property value is to
1571 be calculated in cooperation with a designated expert [RFC8126]
1572 according to the rules specified under Sections 8 and 9.
1573
1574 The IESG is to be notified if backward-incompatible changes to the
1575 table of derived properties are discovered or if other problems arise
1576 during the process of creating the table of derived property values
1577 or during Expert Review. Changes to the rules defined under
1578 Sections 8 and 9 require IETF Review.
1579
1580 Note: IANA is requested to not make further updates to this registry
1581 until it receives notice from the IESG that the issues described in
1582 [IAB-Statement] and Section 13.5 of this document have been settled.
1583
1584 11.2. PRECIS Base Classes Registry
1585
1586 IANA has created the "PRECIS Base Classes" registry
1587 (<https://www.iana.org/assignments/precis-parameters/>). In
1588 accordance with [RFC8126], the registration policy is "RFC Required".
1589
1590
1591
1592 Saint-Andre & Blanchet Standards Track [Page 29]
1593 RFC 8264 PRECIS Framework October 2017
1594
1595
1596 The registration template is as follows:
1597
1598 Base Class: [the name of the PRECIS string class]
1599
1600 Description: [a brief description of the PRECIS string class and its
1601 intended use, e.g., "A sequence of letters, numbers, and symbols
1602 that is used to identify or address a network entity."]
1603
1604 Reference: [the RFC number]
1605
1606 The initial registrations are as follows:
1607
1608 Base Class: FreeformClass
1609 Description: A sequence of letters, numbers, symbols, spaces, and
1610 other code points that is used for free-form strings.
1611 Specification: Section 4.3 of RFC 8264
1612
1613 Base Class: IdentifierClass
1614 Description: A sequence of letters, numbers, and symbols that is
1615 used to identify or address a network entity.
1616 Specification: Section 4.2 of RFC 8264
1617
1618 11.3. PRECIS Profiles Registry
1619
1620 IANA has created the "PRECIS Profiles" registry
1621 (<https://www.iana.org/assignments/precis-parameters/>) to identify
1622 profiles that use the PRECIS string classes. In accordance with
1623 [RFC8126], the registration policy is "Expert Review". This policy
1624 was chosen in order to ease the burden of registration while ensuring
1625 that "customers" of PRECIS receive appropriate guidance regarding the
1626 sometimes complex and subtle internationalization issues related to
1627 profiles of PRECIS string classes.
1628
1629 The registration template is as follows:
1630
1631 Name: [the name of the profile]
1632
1633 Base Class: [which PRECIS string class is being profiled]
1634
1635 Applicability: [the specific protocol elements to which this profile
1636 applies, e.g., "Usernames in security and application protocols."]
1637
1638 Replaces: [the Stringprep profile that this PRECIS profile replaces,
1639 if any]
1640
1641 Width Mapping Rule: [the behavioral rule for handling of width,
1642 e.g., "Map fullwidth and halfwidth code points to their
1643 compatibility variants."]
1644
1645
1646
1647 Saint-Andre & Blanchet Standards Track [Page 30]
1648 RFC 8264 PRECIS Framework October 2017
1649
1650
1651 Additional Mapping Rule: [any additional mappings that are required
1652 or recommended, e.g., "Map non-ASCII space code points to SPACE
1653 (U+0020)."]
1654
1655 Case Mapping Rule: [the behavioral rule for handling of case, e.g.,
1656 "Apply the Unicode toLowerCase() operation."]
1657
1658 Normalization Rule: [which Unicode normalization form is applied,
1659 e.g., "NFC"]
1660
1661 Directionality Rule: [the behavioral rule for handling of right-to-
1662 left code points, e.g., "The 'Bidi Rule' defined in RFC 5893
1663 applies."]
1664
1665 Enforcement: [which entities enforce the rules, and when that
1666 enforcement occurs during protocol operations]
1667
1668 Specification: [a pointer to relevant documentation, such as an RFC
1669 or Internet-Draft]
1670
1671 In order to request a review, the registrant shall send a completed
1672 template to the <precis@ietf.org> list or its designated successor.
1673
1674 Factors to focus on while defining profiles and reviewing profile
1675 registrations include the following:
1676
1677 o Would an existing PRECIS string class or profile solve the
1678 problem? If not, why not? (See Section 5.1 for related
1679 considerations.)
1680
1681 o Is the problem being addressed by this profile well defined?
1682
1683 o Does the specification define what kinds of applications are
1684 involved and the protocol elements to which this profile applies?
1685
1686 o Is the profile clearly defined?
1687
1688 o Is the profile based on an appropriate dividing line between user
1689 interface (culture, context, intent, locale, device limitations,
1690 etc.) and the use of conformant strings in protocol elements?
1691
1692 o Are the width mapping, case mapping, additional mapping,
1693 normalization, and directionality rules appropriate for the
1694 intended use?
1695
1696 o Does the profile explain which entities enforce the rules and when
1697 such enforcement occurs during protocol operations?
1698
1699
1700
1701
1702 Saint-Andre & Blanchet Standards Track [Page 31]
1703 RFC 8264 PRECIS Framework October 2017
1704
1705
1706 o Does the profile reduce the degree to which human users could be
1707 surprised or confused by application behavior (the "Principle of
1708 Least Astonishment")?
1709
1710 o Does the profile introduce any new security concerns such as those
1711 described under Section 12 of this document (e.g., false accepts
1712 for authentication or authorization)?
1713
1714 12. Security Considerations
1715
1716 12.1. General Issues
1717
1718 If input strings that appear "the same" to users are programmatically
1719 considered to be distinct in different systems or if input strings
1720 that appear distinct to users are programmatically considered to be
1721 "the same" in different systems, then users can be confused. Such
1722 confusion can have security implications, such as the false accepts
1723 and false rejects discussed in [RFC6943] (the terms "false positives"
1724 and "false negatives" are used in that document). One starting goal
1725 of work on the PRECIS framework was to limit the number of times that
1726 users are confused (consistent with the "Principle of Least
1727 Astonishment"). Unfortunately, this goal has been difficult to
1728 achieve given the large number of application protocols already in
1729 existence. Despite these difficulties, profiles should not be
1730 multiplied beyond necessity (see Section 5.1). In particular,
1731 designers of application protocols should think long and hard before
1732 defining a new profile instead of using one that has already been
1733 defined, and if they decide to define a new profile then they should
1734 clearly explain their reasons for doing so.
1735
1736 The security of applications that use this framework can depend in
1737 part on the proper preparation, enforcement, and comparison of
1738 internationalized strings. For example, such strings can be used to
1739 make authentication and authorization decisions, and the security of
1740 an application could be compromised if an entity providing a given
1741 string is connected to the wrong account or online resource based on
1742 different interpretations of the string (again, see [RFC6943]).
1743
1744 Specifications of application protocols that use this framework are
1745 strongly encouraged to describe how internationalized strings are
1746 used in the protocol, including the security implications of any
1747 false accepts and false rejects that might result from various
1748 enforcement and comparison operations. For some helpful guidelines,
1749 refer to [RFC6943], [RFC5890], [UTR36], and [UTS39].
1750
1751
1752
1753
1754
1755
1756
1757 Saint-Andre & Blanchet Standards Track [Page 32]
1758 RFC 8264 PRECIS Framework October 2017
1759
1760
1761 12.2. Use of the IdentifierClass
1762
1763 Strings that conform to the IdentifierClass, and any profile thereof,
1764 are intended to be relatively safe for use in a broad range of
1765 applications, primarily because they include only letters, digits,
1766 and "grandfathered" non-space code points from the ASCII range; thus,
1767 they exclude spaces, code points with compatibility equivalents, and
1768 almost all symbols and punctuation marks. However, because such
1769 strings can still include so-called "confusable code points" (see
1770 Section 12.5), protocol designers and implementers are encouraged to
1771 pay close attention to the security considerations described
1772 elsewhere in this document.
1773
1774 12.3. Use of the FreeformClass
1775
1776 Strings that conform to the FreeformClass, and many profiles thereof,
1777 can include virtually any Unicode code point. This makes the
1778 FreeformClass quite expressive, but also problematic from the
1779 perspective of possible user confusion. Protocol designers are
1780 hereby warned that the FreeformClass contains code points they might
1781 not understand, and they are encouraged to profile the
1782 IdentifierClass wherever feasible; however, if an application
1783 protocol requires more code points than are allowed by the
1784 IdentifierClass, protocol designers are encouraged to define a
1785 profile of the FreeformClass that restricts the allowable code points
1786 as tightly as possible. (The PRECIS Working Group considered the
1787 option of allowing "superclasses" as well as profiles of PRECIS
1788 string classes but decided against allowing superclasses to reduce
1789 the likelihood of security and interoperability problems.)
1790
1791 12.4. Local Character Set Issues
1792
1793 When systems use local character sets other than ASCII and Unicode,
1794 this specification leaves the problem of converting between the local
1795 character set and Unicode up to the application or local system. If
1796 different applications (or different versions of one application)
1797 implement different rules for conversions among coded character sets,
1798 they could interpret the same name differently and contact different
1799 application servers or other network entities. This problem is not
1800 solved by security protocols, such as Transport Layer Security (TLS)
1801 [RFC5246] and SASL [RFC4422], that do not take local character sets
1802 into account.
1803
1804 12.5. Visually Similar Characters
1805
1806 Some code points are visually similar and thus can cause confusion
1807 among humans. Such characters are often called "confusable
1808 characters" or "confusables".
1809
1810
1811
1812 Saint-Andre & Blanchet Standards Track [Page 33]
1813 RFC 8264 PRECIS Framework October 2017
1814
1815
1816 The problem of confusable characters is not necessarily caused by the
1817 use of Unicode code points outside the ASCII range. For example, in
1818 some presentations and to some individuals the string "ju1iet"
1819 (spelled with DIGIT ONE (U+0031) as the third character) might appear
1820 to be the same as "juliet" (spelled with LATIN SMALL LETTER L
1821 (U+006C)), especially on casual visual inspection. This phenomenon
1822 is sometimes called "typejacking".
1823
1824 However, the problem is made more serious by introducing the full
1825 range of Unicode code points into protocol strings. A well-known
1826 example is confusion between "а" CYRILLIC SMALL LETTER A (U+0430) and
1827 "a" LATIN SMALL LETTER A (U+0061). As another example, the
1828 characters "ᏚᎢᎵᏋᎢᏋᏒ" (U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC
1829 U+13D2) from the Cherokee block look similar to the ASCII code points
1830 representing "STPETER" as they might appear when presented using a
1831 "creative" font family. Confusion among such characters is perhaps
1832 not unexpected, given that the alphabetic writing systems involved
1833 all bear a family resemblance or historical lineage. Perhaps more
1834 surprising is confusion among characters from disparate writing
1835 systems, such as "O" (LATIN CAPITAL LETTER O, U+004F), "0" (DIGIT
1836 ZERO, U+0030), "໐" (LAO DIGIT ZERO, U+0ED0), "ዐ" (ETHIOPIC SYLLABLE
1837 PHARYNGEAL A, U+12D0), and other graphemes that have the appearance
1838 of open circles. And the reader needs to be aware that the foregoing
1839 represent merely a small sample of characters that are confusable in
1840 Unicode.
1841
1842 In some instances of confusable characters, it is unlikely that the
1843 average human could tell the difference between the real string and
1844 the fake string. (Indeed, there is no programmatic way to
1845 distinguish with full certainty which is the fake string and which is
1846 the real string; in some contexts, the string formed of Cherokee code
1847 points might be the real string and the string formed of ASCII code
1848 points might be the fake string.) Because PRECIS-compliant strings
1849 can contain almost any properly encoded Unicode code point, it can be
1850 relatively easy to fake or mimic some strings in systems that use the
1851 PRECIS framework. The fact that some strings are easily confused
1852 introduces security vulnerabilities of the kind that have also
1853 plagued the World Wide Web, specifically the phenomenon known as
1854 phishing.
1855
1856 Despite the fact that some specific suggestions about identification
1857 and handling of confusable characters appear in the Unicode Security
1858 Considerations [UTR36] and the Unicode Security Mechanisms [UTS39],
1859 it is also true (as noted in [RFC5890]) that "there are no
1860 comprehensive technical solutions to the problems of confusable
1861 characters." Because it is impossible to map visually similar
1862 characters without a great deal of context (such as knowing the font
1863 families used), the PRECIS framework does nothing to map similar-
1864
1865
1866
1867 Saint-Andre & Blanchet Standards Track [Page 34]
1868 RFC 8264 PRECIS Framework October 2017
1869
1870
1871 looking characters together, nor does it prohibit some characters
1872 because they look like others.
1873
1874 Nevertheless, specifications for application protocols that use this
1875 framework are strongly encouraged to describe how confusable
1876 characters can be abused to compromise the security of systems that
1877 use the protocol in question, along with any protocol-specific
1878 suggestions for overcoming those threats. In particular, software
1879 implementations and service deployments that use PRECIS-based
1880 technologies are strongly encouraged to define and implement
1881 consistent policies regarding the registration, storage, and
1882 presentation of visually similar characters. The following
1883 recommendations are appropriate:
1884
1885 1. An application service SHOULD define a policy that specifies the
1886 scripts or blocks of code points that the service will allow to
1887 be registered (e.g., in an account name) or stored (e.g., in a
1888 filename). Such a policy SHOULD be informed by the languages and
1889 scripts that are used to write registered account names; in
1890 particular, to reduce confusion, the service SHOULD forbid
1891 registration or storage of strings that contain code points from
1892 more than one script and SHOULD restrict registrations to code
1893 points drawn from a very small number of scripts (e.g., scripts
1894 that are well understood by the administrators of the service, to
1895 improve manageability).
1896
1897 2. User-oriented application software SHOULD define a policy that
1898 specifies how internationalized strings will be presented to a
1899 human user. Because every human user of such software has a
1900 preferred language or a small set of preferred languages, the
1901 software SHOULD gather that information either explicitly from
1902 the user or implicitly via the operating system of the user's
1903 device.
1904
1905 The challenges inherent in supporting the full range of Unicode code
1906 points have in the past led some to hope for a way to
1907 programmatically negotiate more restrictive ranges based on locale,
1908 script, or other relevant factors; to tag the locale associated with
1909 a particular string; etc. As a general-purpose internationalization
1910 technology, the PRECIS framework does not include such mechanisms.
1911
1912 12.6. Security of Passwords
1913
1914 Two goals of passwords are to maximize the amount of entropy and to
1915 minimize the potential for false accepts. These goals can be
1916 achieved in part by allowing a wide range of code points and by
1917 ensuring that passwords are handled in such a way that code points
1918 are not compared aggressively. Therefore, it is NOT RECOMMENDED for
1919
1920
1921
1922 Saint-Andre & Blanchet Standards Track [Page 35]
1923 RFC 8264 PRECIS Framework October 2017
1924
1925
1926 application protocols to profile the FreeformClass for use in
1927 passwords in a way that removes entire categories (e.g., by
1928 disallowing symbols or punctuation). Furthermore, it is
1929 NOT RECOMMENDED for application protocols to map uppercase and
1930 titlecase code points to their lowercase equivalents in such strings;
1931 instead, it is RECOMMENDED to preserve the case of all code points
1932 contained in such strings and to compare them in a case-sensitive
1933 manner.
1934
1935 That said, software implementers need to be aware that there exist
1936 trade-offs between entropy and usability. For example, allowing a
1937 user to establish a password containing "uncommon" code points might
1938 make it difficult for the user to access a service when using an
1939 unfamiliar or constrained input device.
1940
1941 Some application protocols use passwords directly, whereas others
1942 reuse technologies that themselves process passwords (one example of
1943 such a technology is SASL [RFC4422]). Moreover, passwords are often
1944 carried by a sequence of protocols with backend authentication
1945 systems or data storage systems such as RADIUS [RFC2865] and the
1946 Lightweight Directory Access Protocol (LDAP) [RFC4510]. Developers
1947 of application protocols are encouraged to look into reusing these
1948 profiles instead of defining new ones, so that end-user expectations
1949 about passwords are consistent no matter which application protocol
1950 is used.
1951
1952 In protocols that provide passwords as input to a cryptographic
1953 algorithm such as a hash function, the client will need to perform
1954 proper preparation of the password before applying the algorithm,
1955 because the password is not available to the server in plaintext
1956 form.
1957
1958 Further discussion of password handling can be found in [RFC8265].
1959
1960 13. Interoperability Considerations
1961
1962 13.1. Coded Character Sets
1963
1964 It is known that some existing applications and systems do not
1965 support the full Unicode coded character set, or even any characters
1966 outside the ASCII repertoire [RFC20]. If two (or more) applications
1967 or systems need to interoperate when exchanging data (e.g., for the
1968 purpose of authenticating the combination of a username and
1969 password), naturally they will need to have in common at least one
1970 coded character set and the repertoire of characters being exchanged
1971 (see [RFC6365] for definitions of these terms). Establishing such a
1972 baseline is a matter for the application or system that uses PRECIS,
1973 not for the PRECIS framework.
1974
1975
1976
1977 Saint-Andre & Blanchet Standards Track [Page 36]
1978 RFC 8264 PRECIS Framework October 2017
1979
1980
1981 13.2. Dependency on Unicode
1982
1983 The only coded character set supported by PRECIS is Unicode. If an
1984 application or system does not support Unicode or uses a different
1985 coded character set [RFC6365], then the PRECIS rules cannot be
1986 applied to that application or system.
1987
1988 13.3. Encoding
1989
1990 Although strings that are consumed in PRECIS-based application
1991 protocols are often encoded using UTF-8 [RFC3629], the exact encoding
1992 is a matter for the application protocol that uses PRECIS, not for
1993 the PRECIS framework or for specifications that define PRECIS string
1994 classes or profiles thereof.
1995
1996 13.4. Unicode Versions
1997
1998 It is extremely important for protocol designers and application
1999 developers to understand that various changes can occur across
2000 versions of the Unicode Standard, and such changes can result in
2001 instability of PRECIS categories. The following are merely a few
2002 examples:
2003
2004 o As described in [RFC6452], between Unicode 5.2 (current at the
2005 time IDNA2008 was originally published) and Unicode 6.0, three
2006 code points underwent changes in their GeneralCategory, resulting
2007 in modified handling, depending on which version of Unicode is
2008 available on the underlying system.
2009
2010 o The HasCompat() categorization of a given input string could
2011 change if, for example, the string includes a precomposed
2012 character that was added in a recent version of Unicode.
2013
2014 o The East Asian width property, which is used in many PRECIS width
2015 mapping rules, is not guaranteed to be stable across Unicode
2016 versions.
2017
2018 13.5. Potential Changes to Handling of Certain Unicode Code Points
2019
2020 As part of the review of Unicode 7.0 for IDNA, a question was raised
2021 about a newly added code point that led to a re-analysis of the
2022 normalization rules used by IDNA and inherited by this document
2023 (Section 5.2.4). Some of the general issues are described in
2024 [IAB-Statement] and pursued in more detail in [IDNA-Unicode].
2025
2026 At the time of this writing, these issues have yet to be settled.
2027 However, implementers need to be aware that this specification is
2028
2029
2030
2031
2032 Saint-Andre & Blanchet Standards Track [Page 37]
2033 RFC 8264 PRECIS Framework October 2017
2034
2035
2036 likely to be updated in the future to address these issues. The
2037 potential changes include but might not be limited to the following:
2038
2039 o The range of code points in the LetterDigits category
2040 (Sections 4.2.1 and 9.1) might be narrowed.
2041
2042 o Some code points with special properties that are now allowed
2043 might be excluded.
2044
2045 o More additional mapping rules (Section 5.2.2) might be defined.
2046
2047 o Alternative normalization methods might be added.
2048
2049 As described in Section 11.1, until these issues are settled, it is
2050 reasonable for the IANA to apply the same precautionary principle
2051 described in [IAB-Statement] to the "PRECIS Derived Property Value"
2052 registry as is applied to the "IDNA Parameters" registry
2053 <https://www.iana.org/assignments/idna-tables/>: that is, to not make
2054 further updates to the registry.
2055
2056 Nevertheless, implementations and deployments are unlikely to
2057 encounter significant problems as a consequence of these issues or
2058 potential changes if they follow the advice given in this
2059 specification to use the more restrictive IdentifierClass whenever
2060 possible or, if using the FreeformClass, to allow only a restricted
2061 set of code points, particularly avoiding code points whose
2062 implications they do not understand.
2063
2064 14. References
2065
2066 14.1. Normative References
2067
2068 [RFC20] Cerf, V., "ASCII format for network interchange", STD 80,
2069 RFC 20, DOI 10.17487/RFC0020, October 1969,
2070 <https://www.rfc-editor.org/info/rfc20>.
2071
2072 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
2073 Requirement Levels", BCP 14, RFC 2119,
2074 DOI 10.17487/RFC2119, March 1997,
2075 <https://www.rfc-editor.org/info/rfc2119>.
2076
2077 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
2078 Interchange", RFC 5198, DOI 10.17487/RFC5198, March 2008,
2079 <https://www.rfc-editor.org/info/rfc5198>.
2080
2081
2082
2083
2084
2085
2086
2087 Saint-Andre & Blanchet Standards Track [Page 38]
2088 RFC 8264 PRECIS Framework October 2017
2089
2090
2091 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in
2092 Internationalization in the IETF", BCP 166, RFC 6365,
2093 DOI 10.17487/RFC6365, September 2011,
2094 <https://www.rfc-editor.org/info/rfc6365>.
2095
2096 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2097 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2098 May 2017, <https://www.rfc-editor.org/info/rfc8174>.
2099
2100 [Unicode] The Unicode Consortium, "The Unicode Standard",
2101 <http://www.unicode.org/versions/latest/>.
2102
2103 14.2. Informative References
2104
2105 [DerivedCoreProperties]
2106 The Unicode Consortium, "DerivedCoreProperties-
2107 10.0.0.txt", Unicode Character Database, March 2017,
2108 <http://www.unicode.org/Public/UCD/latest/ucd/
2109 DerivedCoreProperties.txt>.
2110
2111 [Err4568] RFC Errata, Erratum ID 4568, RFC 7564,
2112 <https://www.rfc-editor.org/errata/eid4568>.
2113
2114 [IAB-Statement]
2115 Internet Architecture Board, "IAB Statement on Identifiers
2116 and Unicode 7.0.0", February 2015,
2117 <https://www.iab.org/documents/
2118 correspondence-reports-documents/2015-2/
2119 iab-statement-on-identifiers-and-unicode-7-0-0/>.
2120
2121 [IDNA-Unicode]
2122 Klensin, J. and P. Faltstrom, "IDNA Update for Unicode
2123 7.0.0", Work in Progress, draft-klensin-idna-5892upd-
2124 unicode70-04, March 2015.
2125
2126 [PropertyAliases]
2127 The Unicode Consortium, "PropertyAliases-10.0.0.txt",
2128 Unicode Character Database, February 2017,
2129 <http://www.unicode.org/Public/UCD/latest/ucd/
2130 PropertyAliases.txt>.
2131
2132 [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson,
2133 "Remote Authentication Dial In User Service (RADIUS)",
2134 RFC 2865, DOI 10.17487/RFC2865, June 2000,
2135 <https://www.rfc-editor.org/info/rfc2865>.
2136
2137
2138
2139
2140
2141
2142 Saint-Andre & Blanchet Standards Track [Page 39]
2143 RFC 8264 PRECIS Framework October 2017
2144
2145
2146 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
2147 Internationalized Strings ("stringprep")", RFC 3454,
2148 DOI 10.17487/RFC3454, December 2002,
2149 <https://www.rfc-editor.org/info/rfc3454>.
2150
2151 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
2152 "Internationalizing Domain Names in Applications (IDNA)",
2153 RFC 3490, DOI 10.17487/RFC3490, March 2003,
2154 <https://www.rfc-editor.org/info/rfc3490>.
2155
2156 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
2157 Profile for Internationalized Domain Names (IDN)",
2158 RFC 3491, DOI 10.17487/RFC3491, March 2003,
2159 <https://www.rfc-editor.org/info/rfc3491>.
2160
2161 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
2162 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2163 2003, <https://www.rfc-editor.org/info/rfc3629>.
2164
2165 [RFC4422] Melnikov, A., Ed. and K. Zeilenga, Ed., "Simple
2166 Authentication and Security Layer (SASL)", RFC 4422,
2167 DOI 10.17487/RFC4422, June 2006,
2168 <https://www.rfc-editor.org/info/rfc4422>.
2169
2170 [RFC4510] Zeilenga, K., Ed., "Lightweight Directory Access Protocol
2171 (LDAP): Technical Specification Road Map", RFC 4510,
2172 DOI 10.17487/RFC4510, June 2006,
2173 <https://www.rfc-editor.org/info/rfc4510>.
2174
2175 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
2176 Recommendations for Internationalized Domain Names
2177 (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006,
2178 <https://www.rfc-editor.org/info/rfc4690>.
2179
2180 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
2181 Specifications: ABNF", STD 68, RFC 5234,
2182 DOI 10.17487/RFC5234, January 2008,
2183 <https://www.rfc-editor.org/info/rfc5234>.
2184
2185 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
2186 (TLS) Protocol Version 1.2", RFC 5246,
2187 DOI 10.17487/RFC5246, August 2008,
2188 <https://www.rfc-editor.org/info/rfc5246>.
2189
2190 [RFC5890] Klensin, J., "Internationalized Domain Names for
2191 Applications (IDNA): Definitions and Document Framework",
2192 RFC 5890, DOI 10.17487/RFC5890, August 2010,
2193 <https://www.rfc-editor.org/info/rfc5890>.
2194
2195
2196
2197 Saint-Andre & Blanchet Standards Track [Page 40]
2198 RFC 8264 PRECIS Framework October 2017
2199
2200
2201 [RFC5891] Klensin, J., "Internationalized Domain Names in
2202 Applications (IDNA): Protocol", RFC 5891,
2203 DOI 10.17487/RFC5891, August 2010,
2204 <https://www.rfc-editor.org/info/rfc5891>.
2205
2206 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
2207 Internationalized Domain Names for Applications (IDNA)",
2208 RFC 5892, DOI 10.17487/RFC5892, August 2010,
2209 <https://www.rfc-editor.org/info/rfc5892>.
2210
2211 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
2212 for Internationalized Domain Names for Applications
2213 (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010,
2214 <https://www.rfc-editor.org/info/rfc5893>.
2215
2216 [RFC5894] Klensin, J., "Internationalized Domain Names for
2217 Applications (IDNA): Background, Explanation, and
2218 Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
2219 <https://www.rfc-editor.org/info/rfc5894>.
2220
2221 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for
2222 Internationalized Domain Names in Applications (IDNA)
2223 2008", RFC 5895, DOI 10.17487/RFC5895, September 2010,
2224 <https://www.rfc-editor.org/info/rfc5895>.
2225
2226 [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code
2227 Points and Internationalized Domain Names for Applications
2228 (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,
2229 November 2011, <https://www.rfc-editor.org/info/rfc6452>.
2230
2231 [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and
2232 Problem Statement for the Preparation and Comparison of
2233 Internationalized Strings (PRECIS)", RFC 6885,
2234 DOI 10.17487/RFC6885, March 2013,
2235 <https://www.rfc-editor.org/info/rfc6885>.
2236
2237 [RFC6943] Thaler, D., Ed., "Issues in Identifier Comparison for
2238 Security Purposes", RFC 6943, DOI 10.17487/RFC6943, May
2239 2013, <https://www.rfc-editor.org/info/rfc6943>.
2240
2241 [RFC7564] Saint-Andre, P. and M. Blanchet, "PRECIS Framework:
2242 Preparation, Enforcement, and Comparison of
2243 Internationalized Strings in Application Protocols",
2244 RFC 7564, DOI 10.17487/RFC7564, May 2015,
2245 <https://www.rfc-editor.org/info/rfc7564>.
2246
2247
2248
2249
2250
2251
2252 Saint-Andre & Blanchet Standards Track [Page 41]
2253 RFC 8264 PRECIS Framework October 2017
2254
2255
2256 [RFC7622] Saint-Andre, P., "Extensible Messaging and Presence
2257 Protocol (XMPP): Address Format", RFC 7622,
2258 DOI 10.17487/RFC7622, September 2015,
2259 <https://www.rfc-editor.org/info/rfc7622>.
2260
2261 [RFC7790] Yoneya, Y. and T. Nemoto, "Mapping Characters for Classes
2262 of the Preparation, Enforcement, and Comparison of
2263 Internationalized Strings (PRECIS)", RFC 7790,
2264 DOI 10.17487/RFC7790, February 2016,
2265 <https://www.rfc-editor.org/info/rfc7790>.
2266
2267 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
2268 Writing an IANA Considerations Section in RFCs", BCP 26,
2269 RFC 8126, DOI 10.17487/RFC8126, June 2017,
2270 <https://www.rfc-editor.org/info/rfc8126>.
2271
2272 [RFC8265] Saint-Andre, P. and A. Melnikov, "Preparation,
2273 Enforcement, and Comparison of Internationalized Strings
2274 Representing Usernames and Passwords", RFC 8265,
2275 DOI 10.17487/RFC8265, October 2017,
2276 <https://www.rfc-editor.org/info/rfc8265>.
2277
2278 [RFC8266] Saint-Andre, P., "Preparation, Enforcement, and Comparison
2279 of Internationalized Strings Representing Nicknames",
2280 RFC 8266, DOI 10.17487/RFC8266, October 2017,
2281 <https://www.rfc-editor.org/info/rfc8266>.
2282
2283 [UAX11] Unicode Standard Annex #11, "East Asian Width", edited by
2284 Ken Lunde. An integral part of The Unicode Standard,
2285 <http://unicode.org/reports/tr11/>.
2286
2287 [UAX15] Unicode Standard Annex #15, "Unicode Normalization Forms",
2288 edited by Mark Davis and Ken Whistler. An integral part
2289 of The Unicode Standard,
2290 <http://unicode.org/reports/tr15/>.
2291
2292 [UAX9] Unicode Standard Annex #9, "Unicode Bidirectional
2293 Algorithm", edited by Mark Davis, Aharon Lanin, and Andrew
2294 Glass. An integral part of The Unicode Standard,
2295 <http://unicode.org/reports/tr9/>.
2296
2297 [UTR36] Unicode Technical Report #36, "Unicode Security
2298 Considerations", edited by Mark Davis and Michel Suignard,
2299 <http://unicode.org/reports/tr36/>.
2300
2301 [UTS39] Unicode Technical Standard #39, "Unicode Security
2302 Mechanisms", edited by Mark Davis and Michel Suignard,
2303 <http://unicode.org/reports/tr39/>.
2304
2305
2306
2307 Saint-Andre & Blanchet Standards Track [Page 42]
2308 RFC 8264 PRECIS Framework October 2017
2309
2310
2311 Appendix A. Changes from RFC 7564
2312
2313 The following changes were made from [RFC7564].
2314
2315 o Recommended the Unicode toLowerCase() operation over the Unicode
2316 toCaseFold() operation in most PRECIS applications.
2317
2318 o Clarified the meaning of "preparation", and described the
2319 motivation for including it in PRECIS.
2320
2321 o Updated references.
2322
2323 See [RFC7564] for a description of the differences from [RFC3454].
2324
2325 Acknowledgements
2326
2327 Thanks to Martin Duerst, William Fisher, John Klensin, Christian
2328 Schudt, and Sam Whited for their feedback. Thanks to Sam Whited also
2329 for submitting [Err4568].
2330
2331 See [RFC7564] for acknowledgements related to the specification that
2332 this document supersedes.
2333
2334 Some algorithms and textual descriptions have been borrowed from
2335 [RFC5892]. Some text regarding security has been borrowed from
2336 [RFC5890], [RFC8265], and [RFC7622].
2337
2338 Authors' Addresses
2339
2340 Peter Saint-Andre
2341 Jabber.org
2342 P.O. Box 787
2343 Parker, CO 80134
2344 United States of America
2345
2346 Phone: +1 720 256 6756
2347 Email: stpeter@jabber.org
2348 URI: https://www.jabber.org/
2349
2350
2351 Marc Blanchet
2352 Viagenie
2353 246 Aberdeen
2354 Québec, QC G1R 2E1
2355 Canada
2356
2357 Email: Marc.Blanchet@viagenie.ca
2358 URI: http://www.viagenie.ca/
2359
2360
2361
2362 Saint-Andre & Blanchet Standards Track [Page 43]
2363
L: Control(cp) = True
L:ControlGeneral_Category(cp) =TrueCc