RE: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 01/12/05-05:25:34 PM Z


Subject: RE: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE
Date: Wed, 12 Jan 2005 18:25:34 -0500
Message-ID: <C98692FD98048C41885E0B0FACD9DFB840D241@exnane01.hq.netapp.com>
From: "Noveck, Dave" <Dave.Noveck@netapp.com>

> So the server can determine that it should return _BAD_STATEID.  If
> that can be done, would it be possible to verify the structure of the
> stateid and that is was provided during the current server instance
> and return _EXPIRED instead (without holding all of the associated
> state)? 

We certainly can determine if it was provided during the current 
server instance.  We have to do that so we know when to return
STALE_STATEID.

It seems to me that you would also need, for the EXPIRED distinction,
to determine whether it was created for the current client instance.

I think it is possible to do that latter but not without significant 
change to our stateid format.  We have three 32-bit fields, server
instance id (boot time), an index into a table, and a generation 
number for the recycling stateid's with the same index field.

The best route to determine client id would be to replace the server
instance with a 32-bit client-server instance id that uniquely 
describes clientid.  You would have to store on disk the last
client-server instance assigned and thus you could figure out for
such id whether it was assigned by the current server instance, as
well as determining (by equality of such id's) whether a given state
id was associated with the current client instance.

So:

      If id instance is less than the first assigned by this server
      instance --> STALE_STATEID

      else if the table index is invalid --> BAD_STATEID

      else if the id instance matches one for a current client instance
{
          if the generation number doesn't match what is stored in the 
          table --> EXPIRED

          if the instance id does not match the one stored in the 
          table --> BAD_STATEID          
      }

      else --> BAD_STATEID
     
Note that, just as when you get a STALE_STATEID, the server isn't 
really telling you that that stateid was associated with a previous 
server instance but rather that if it is valid, it was associated
with a previous.  Similarly EXPIRED would mean that if the stateid
was ever valid, it was associated with a the current client instance.
BAD_STATEID would tell you that the either stateid was never associated 
with any server instance or it was associated with an old cleint
instance
talking to the current server instance.

> If BAD_STATEID is received within the lease period (or the client's
> perception of it), it is difficult to determine what happened and the
> best assumption is that this piece of state has been munged somehow
> (either administratively or because of a broken implementation).

BAD_STATEID within the lease period tells you something is wrong (either
the client is sending a bad state and the server is validly rejecting
it, the client is sending a valid state and the server is invalidly
rejecting it , or the big bad administrator has decided to pick on our
poor stateid) and doesn't help you figure out which.

BAD_STATEID outside the lease period tells you something is wrong (any
of the previous three items plus a failure to get the renew to the
server in time due to network problems or a bad implementation) and 
doesn't help you figure out which.




-----Original Message-----
From: Spencer Shepler [mailto:spencer.shepler@sun.com] 
Sent: Friday, January 07, 2005 4:29 PM
To: nfsv4@ietf.org
Subject: Re: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE


On Fri, Noveck, Dave wrote:
> Spencer Shepler wrote.
> > It might help me if we split the question of returning
NFS4ERR_EXPIRED
> > into two pieces: with and without the use of SETCLIENTID...
> 
> That makes sense.  I'll try my best to avoid splitting those into
> four sub-cases :-)
> 
> > So in the case of a network partition between client and server in
> > which the partition lasts longer than the lease period, the server
> > presumably has to return some error to the client.  It wouldn't
allow
> > the client to continue using state associate with the now-expired
> >lease.  NFS4ERR_EXPIRED is the most appropriate.  Not sure what else
> > the server could do in this case.
> 
> As you note below, BAD_STATEID would give the client the message that
> his state-id is no longer usable.  It doesn't give the reason but 
> I'm not sure the reason is all that helpful to the client.  The
> fact that is important is that the stateid is no longer valid and
> it isn't clear what the client would do with the information
> that lease expiration was the reason.  In many cases, it is the
> only possible reason, though.

For a well behaving client and server, yes, it is the most likely cause.
I suppose that is the best viewpoint to take throughout this discussion
is that we have a client and server that are plainly broken.
The Solaris client will handle either error returns.  Recovery is
started
upon receipt of the _EXPIRED error (SETCLIENTID sequence) and in the
case of _BAD_STATEID, the client will check to see if it is outside
of the lease period; if so, it will start the recovery sequence.
I believe the _BAD_STATEID handling was put into place because of
a server bug that existed for some period in early development and
was left behind. :-)

> The problem with EXPIRED is the one Rick mentioned, that there is
> no way to delimit when it is no longer needed and so you have a piece
> of state that there is no way to deallocate, at least within a client
> instance, and it is troubling to have something where there is a
> resource leak by design, even when the magnitude of leakage means
> that it is not a big issue in practice.  Note also that if the
stateid's
> can't go away, neither can the owner, and so there is another thing 
> which leaks.
>
> By the way, one interesting side question about delimiting the 
> scope of EXPIRED concerns RELEASE_OWNER.  If I do a RELEASE_OWNER
> and all of the associated stateid's are EXPIRED, then I would say
> that this would go through, allow deallocation of those stateid's
> and the lockowner.  This leaves open stateid's and allowing CLOSE
> of those would allow us a way to avoid leakage if the client takes
> care to get rid of such stuff.
> 
> But, if you deallocate revoked state, then you don't face leakage
> issues, and the client still gets the message ("That state you just
> handed me is gone.  Live with it.") via the BAD_STATEID and life goes 
> on.

So the server can determine that it should return _BAD_STATEID.  If
that can be done, would it be possible to verify the structure of the
stateid and that is was provided during the current server instance
and return _EXPIRED instead (without holding all of the associated
state)?  

Not sure it really matters given that we seem to be agreeing that the
client should interpret the receipt of _BAD_STATEID outside of the
lease period as equivalent to _EXPIRED.

> Bias alert:  My approval of returning BAD_STATEID may have something
> to do the fact that that is what our server currently does.

Then the rest of the client implementations should take note. :-)

> > In the other case, use of SETCLIENTID, the client may still have
some
> > inflight requests.  For example, the client may have received a
> > NFS4ERR_EXPIRED error on one request and started its recovery and
sent
> > the SETCLIENTID whilst other requests were still outstanding (which
> > happen to use state from the previous client/server instantiation).
> 
> If you do setcl/setcl-cf with a new verifier and you have outstanding
> requests that refer to states within the state corpus of the previous
> client instance that this setcl/setcl-cf will trash, you need to deal
> with the consequences.  My inclination, if I had to write a client, 
> would to simplify things and drain that stuff before proceeding to 
> what is, from the v4 state point of view, the moral equivalent of a 
> client reboot.

Agreed and this is what the Solaris client does because it does make
it easier to deal with a lot of things if all of the requests are
"drained" first.

My comments were made in the context of the RFC in that it does not
dictate the client's implementation in this area.

> I want to leave aside expired stateid's for the moment.  When you
issue
> the setcl/setcl-cf you don't know that all of locks, and thus the
> stateids are in the expired state, some may be fine.  For those non-
> expired stateids that are trashed by the new setcl/setcl-cf, the 
> client has to be prepared for BAD_STATEID.  You can't return EXPIRED
> and they are not valid.  They are bad stateid's and there is nothing
> the server can return but BAD_STATEID.
> 
> > It seems that the server will again need to return some error based
on
> > checking the stateid and NFS4ERR_EXPIRED seems most appropriate.  I
> > suppose that NFS4ERR_BAD_STATEID may be appropriate but _EXPIRED is
> > friendlier.
> 
> Despite my resolution to avoid four sub-cases, I appear forced to it:
> 
> If you have setcl racing with other state referencing requests, then 
> you may have the request hit the server BEFORE or AFTER the setcl
> and it may encounter a state that was OK or REVOKED (due to lease
> expiration.  So, we have four cases:
> 
> 1) AFTER/OK
> 
>    It seems like the only thing that could be returned here is
> BAD_STATEID.
>    So the client has to be prepared for that case.
> 
> 2) AFTER/REVOKED
> 
>    I would return BAD_STATEID here, indicating that if there were any
> state
>    corresponding to that stateid, it has been trashed, at the clients
> request.
> 
>    Returning EXPIRED to indicate that the state which was trashed was
> revoked
>    at the time does not seem helpful to me.  You might describe it as
> friendlier, 
>    but I would consider it obsessively friendly, in giving you
dubiously
>    helpful information you don't care about.
> 
> 3) BEFORE/OK
> 
>    State is OK and the operation goes through.
> 
> 4) BEFORE/REVOKED
> 
>    I agree that EXPIRED can be returned her, but I would argue that
> BAD_STATEID
>    is just as good.  The client knows that the requested operation did
> not 
>    happen and that when the setcl/setcl-cf completes, he has clean
> slate, statewise.
> 
> So the client has to be prepared for OK, EXPIRED and BAD_STATEID and
it
> isn't
> clear what he would do different in the two error cases.

If BAD_STATEID is received within the lease period (or the client's
perception of it), it is difficult to determine what happened and the
best assumption is that this piece of state has been munged somehow
(either administratively or because of a broken implementation).

> > I agree that the RFC doesn't mandate the return of _EXPIRED but as
> > mentioned above, it seems most appropriate.
> 
> Not to me.  If the client issues a setcl/setcl-cf with a new verifier
> then it is asking for the state corpus associated with the same id and
> different verifier, to be trashed/eliminated.  All the stateids,
locks,
> etc, become invalid.  You cannot issue a new lock request and have it
> conflict with a lock from the previous instance.  Stateids from that
> instance become invalid and trying to maintain some across the
instance
> boundary seems wrong to me (and besides that I don't want to do the
> work to that unless it is *really* needed).

I agree that the client,, through implementation, can assure that
outstanding requests are cleared and will be able to deal with the
respective errors.  Outside of the lease period, it makes it easier to
see the _EXPIRED but not impossible to deal with (as the Solaris
implementation has shown).

> > Are you concerned about exhaustion of the stateid space and reuse
such
> > that the server will be unable to meet a MUST statement about
_EXPIRED
> > error returns?
> 
> That's certainly part of it.  The other part, to be honest, is that
our
> server does not do this, and we've got enough work to do that we are
> reluctant to make changes unless they are either clearly required by
> the spec or are for other reasons something that clients truly need.
> 
> If clients would find EXPIRED helpful, I can see returning it on a
best-
> effort basis, and only within the code of a single client setcl
> instance.  
> Keeping the EXPIRED state for a few lease times and then freeing it, 
> allowing any subsequent references to get BAD_STATEID, seems a
> reasonable 
> way of providing this more detailed error information to clients who
> want 
> it and are interested enough to obtain it within a reasonable time.

Seems reasonable.

Any other opinions?

Spencer

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:13:51 AM Z CST