From: Nicolas Williams (Nicolas.Williams@sun.com)
Date: 01/24/03-02:06:03 PM Z
Date: Fri, 24 Jan 2003 14:06:03 -0600
From: Nicolas Williams <Nicolas.Williams@sun.com>
Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis- 05 draft]
Message-ID: <20030124140603.Z16765@binky.central.sun.com>
On Fri, Jan 24, 2003 at 10:33:10AM -0800, Mike Eisler wrote:
> Nicolas Williams wrote:
>
> > Er, no, utf8str_cs requires that a normalization form be used, just not
> > on the wire. So the problem of legacy filesystems remains even if we do
> > not act to recommend or require a specific normalization form on the
> > wire.
>
> It requires a normalization form only if the server is case insensitive.
This is not my reading of the draft.
Section 11.1.1 clearly specifies what to do with respect to filenames
with equal names but different encoding [due to different normalization
forms used by the clients that create them]. Section 11.1.1 necessarily
requires that the server perform normalization of client inputs to some
form of the server's choosing.
The other utf8str_* types are not claimed to be useable for filenames:
- "The utf8str_cis type is a case insensitive string of UTF-8
characters. Its primary use in NFS Version 4 is for naming NFS
servers."
- "The utf8str_mixed type is a string of UTF-8 characters, with a
prefix that is case sensitive, a separator equal to '@', and a
suffix that is fully qualified domain name. Its primary use in NFS
Version 4 is for naming principals identified in an Access Control
Entry."
Only utf8str_cs is said to be useable for filename components.
So, my reading is that utf8str_cs is for filename components, that the
server must normalize client filename component inputs to some
normalization form preferred by the server, and that duplicate filenames
are not allowed.
> > > At least I feel the need for a "Normalization Forms for Dummies" document.
> > > Maybe other working group members do as well. Any pointers to something
> > > that will explain this stuff to those who have not already immersed
> > > themsleves in this area.
> >
> > There are several books with "Unicode" in the title (none in my office
> > right now). And the Unicode home page is a good place to start:
> >
> > http://www.unicode.org/
>
> I find each visit to unicode.org ever more confusing. My most
> recent visit revealed that "16 bit" Unicode now has over 2^64 characters.
> It is absolutely impenetrable.
?? There's no such thing as 16-bit Unicode. There's UTF-16, an encoding
for Unicode; UTF-16 is not limited to 2^16 codepoints: it can, at the
very least, represent 2^21 codepoints, including all of the BMP.
UTF-8 can represent all of the codepoint space (2^31), though it's been
recently restricted to just 2^21, since that now seems to be all that
will ever be necessary (ha! where have we heard that before? still,
seems reasonable).
> The nfsv4 i18n follows the lead of IETF's stringprep RFC (which we
> were asked to do by IESG). Presumably the folks who wrote it were
> experts, and they strongly recommend KC for case insensitive matching.
> Now we have two other experts in the last 24 hours disagree, but with
> two more opinions.
Er, form KC is the form recommended by the IETF, yes. That does not
change the fact that form D is lighter weight, nor does the NFSv4 draft
specify any form for utf8str_cs.
> Clearly, no matter what we do in this area, it is highly
> probable we'll get it wrong.
It's not clear that what NFSv4 specifies is wrong; utf8str_cs is
certainly sub-optimal, but it's not incorrect.
> Just as clearly, normalization is not well thought out by the
> people specifying this; otherwise it would be much easier to grasp.
I disagree.
And note that my preference for form D over C has to do with code bloat
and performance; I think that case-insensitive comparisons can be made
of Unicode strings normalized with form D just as well as with form C
(just to be sure I'll hit the books again).
> So what do we want to do with the i-d that IESG has approved for
> publication. My inclination is to leave it alone, since I suspect we
> could delay it for 12 more months and still not reach consensus.
> Only via real experience will a practical truth emerge ... it may that
> IESG is right, it may be that Dan is right, Nico is right, Dave is right. My
> guess is it will be none of the above. Fixed in NFSv4.x (x > 0).
I'd like to see the WG RECOMMEND to clients _a_ normalization form for
utf8str_cs.
I think we may not be able to change the spec to require that that norm
form be used on the wire for utf8str_cs types, not at this late a date,
though that is what I would have preferred. Problem is, I don't think
it will ever be possible or worthwhile anymore to make this change later
if we don't do it now. Is that a big deal? Well, only if you think
that having full-blown Unicode normalization facilities in the server
implementations of NFSv4 (which may be required for multi-protocol
servers anyways) is a big deal. Hindsight is 20/20...
So, if nothing else, we're all aware now that NFSv4 i18n is tricky to
implement correctly; being aware is better than nothing, and if that's
all that comes out of this thread it will still be worth having had it.
Also, at some point we need to test i18n at Connectathon.
> -mre
Cheers,
Nico
--
This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:49 AM Z CST