Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft]

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Dan Oscarsson (Dan.Oscarsson@kiconsulting.se)
Date: 01/29/03-04:03:05 AM Z


Message-Id: <200301291003.h0TA35AT016339@valinor.malmo.trab.se>
Date: Wed, 29 Jan 2003 11:03:05 +0100 (CET)
From: Dan Oscarsson <Dan.Oscarsson@kiconsulting.se>
Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft]

>> That depends on what you normalise from. If you start with any of the
>> ISO 8859-1 character sets then converting and normalising into
>> UCS form C is easy done without any knowledge on form D.
>
>That's codeset conversion.  In the NFSv4 case we're talking about the
>client sending unnormalized UTF-8 (therefore Unicode) filename strings.
>The server has to then normalize to a canonical form (to prevent equal
>name / unequal encoding conflicts).  This process ALWAYS starts with
>normalization to form D; end of story.

Even so, when doing normalisation to C I expect you can combine
the normalisation D part as a part in normalisation to C that do
not give any (or very little) addition. I doubt that normalisation
form C need to take more data than form D. Form D can take a lot
more data space from the kernel so it is not suitable if
we have kernels with little available memory to work in.



>> >I believe that this is what the draft specifies.  An on-the-wire
>> >normalization form specification would be an optimization, but is not
>> >absolutely necessary.
>> 
>> OK. What will happen if we do not require it?
>
>The server has to normalize the client's filenames to avoid equal name /
>unequal encoding conflicts.
>
>If it were required the server would only have to check that clients
>send normalized filenames and return some error if they don't.
>
>In order words: nothing.  I.e., NFSv4 is NOT broken wrt i18n by not
>specifying an on-the-wire normalization form for filenames.  I've said
>this now more than once now - do you take issue with this statement?

It is not broken, but it will make it a lot more difficult to
implement and increase possible failure. It will result in big
tables for normalisation everywhere, as the format will never
match a systems internal needs (except those using unnormalised
data which must be very rare). Optimising will be difficult.
The end mounting a file system from a server will both need code
to handle normalisation and conversion to local character set.


>> I assume the same as that which happens now with NFSv3.
>
>Oh no, not at all.  NFSv4 uses UTF-8, and therefore Unicode, on the
>wire for filenames - not so for NFSv3.  Big difference.

I would not call it big. I get problems with NFSv3 due to not having a
standardised character set and encoding.
With NFSv4, if the mounted file system do not normalise and convert into
my legacy character set, it will be just as bad.
Even if I switched to UTF-8 as my local character set it will fail, if
the UTF-8 encoded text is not normalised form C. No other form
is acceptible to use due to things like invalid semantics, to
much data space and complex and CPU consuming handling of that format.

You cannot expect systems to switch to unnormalised UTF-8 in their
file system to help NFSv4. It will break most applications.

Just like all other protocols that communicate between systems, NFS
need to convert between local and on the wire format at the end points
of the communication link. And looking at history and common sense,
allowing more than one possible format on the wire results in
failed communication. I can create compact and fast conversion and handling
for one format. I do not have time to write code to handle everything.

(I tried to find out what CIFS have. From what I could find out
Microsoft uses UCS-2/UTF-16 with precomposed characters - that is closest
to form C).


>> Normalisation form C is for the current version of Unicode (3.2).
>> Form C have most characters that have been "precomposed" before in
>> that form (in legacy character sets). People are used to working with
>> precomposed characters.
>
>No, normalization form C is limited to using composed characters defined
>in Unicode 3.0.  I'll search for a reference tonight and post it
>tomorrow, but I'm quite sure of this.

Normalisation form C is driven by the tables that Unicode define for each
version. So it automatically follows each version.

-
So I still think NFSv4 will be much better and easier to implement
by defining that all UCS data should be in form C. I am sure I will
get interoperability problems otherwise.

   Dan


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:51 AM Z CST