[nfsv4] Re: stable storage for server restart

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: William A.(Andy) Adamson (andros@citi.umich.edu)
Date: 02/21/05-01:32:17 PM Z


Date: Mon, 21 Feb 2005 14:32:17 -0500
From: "William A.(Andy) Adamson" <andros@citi.umich.edu>
Message-Id: <20050221193217.BD6BA1BC53@citi.umich.edu>
Subject: [nfsv4] Re: stable storage for server restart 

rick@snowhite.cis.uoguelph.ca said:
> (I cc'd nfsv4@ietf.org, just in case anyone not on the linux list is
> interested. Apologies in advance for cluttering up your email. Also, I'd be
> interested in hearing any thoughts others have on the design.)

i'm also interested in thoughts on our design for the linux server which is 
similar.

instead of appending a recovery file we use the 'file system as a data base' 
approach, populating a (configurable) recovery directory with one directory 
per clientid. the clientid directory name is the md5 hash of the client 
supplied client id which can be of length 1024. the md5 hash is calculated at 
SETCLIENTID, and we return CLIENTID_IN_USE for md5 cache hits, which should be 
negligible. no upcalls are involed, all operations are done in-kernel.
 
>   - at server startup (before any Compounds are performed), the log is
>     read and an in-memory structure is created, indicating what clients
>     can reclaim state

we read the recovery directory, populating in-memory structures indicating 
which clients can reclaim state.

>  - during reclaim, the in-memory structure is used to check for Grace and
>     is marked for successful reclaims, per client

we do the same.

> * - at the end of the grace period, the file is truncated to 0 length and
>     a new append log is written from the in-memory structure, with one record
>     for each client that successfully reclaimed some state

at the end of the grace period, we remove the clientid directories from the 
recovery directory for those clients that did not reclaim state.

>  - then normal, non-grace operation starts...
>     - records are appended to the log when a client acquires the first state
>       (first Open) after a SetClientID and when state is revoked for a
>       client (I do not support revocation of only some state for a client)
>       (nb: The first Open records are only done for clients that didn't
>        successfully reclaim during grace.) 

normal, non-grace operation starts:
  - we add a clientid directory to the recovery directory when a client makes 
their first successful open confirm. we remove a clientid directory from the 
list whenever either its lease expires or admin action removes the client 
state.

> - I had to lock the other nfsd threads out when updating stable storage. The
>   reason was:

for us this is all auto-magically handled by exitsing directory operations 
(mkdir,rmdir), and our nfs state lock.

>   - I needed to record the revocation before issuing conflicting lock state,
>     and I wanted to avoid races between multiple clients trying to acquire
>     conflicting locks while the write(s) to disk were in progress. Since
>     revocation is a rare event, I didn't see this as a serious performance
>     hit.
>   - For the case of first Open, the record indicates successful lock
>     state acquisition. If another client acquires a conflicting lock
>     while the disk write(s) for the log are in progress, there would be
>     a record indicating that the client had successfully acquired state
>     although the lock failed, due to a conflict. Is this actually a
>     problem? I'm not convinced it is, but my code "plays it safe" for now.
>     I could see this being a significant performance hit, if lots of new
>     clients did SetClientIDs followed by Opens at the same time. (Ones
>     that haven't already reclaimed locks at server restart.)
>   - The other two cases (when server first starts up and at end-of-grace)
>     only occur once per server reboot and only add a little time to the
>     grace period. 
>
> I think the weakest part of this design is that, if the server crashes again
> while at "*", the append only log is not complete (possibly empty). This will
> result in clients not being allowed to reclaim, that otherwise should be able
> to (ie. no entry->no reclaim->NFS4ERR_NOGRACE for all reclaim requests).

when we crash at *, our recovery directory has all the clients that have just 
successfully reclaimed state, plus potentially some that could have reclaimed 
state, but didn't during the last grace period. so, i feel ok about crashing 
at * and recovering with the data in the recovery directory.

> The append log will grow, but I only see a problem if clients go hogwild with
> SetClientIDs. It does get truncated when the server restarts, so a sysadmin
> can just reboot when it gets too big:-)

our recovery directory only holds active clients; this is strength of our 
design.

> It also doesn't support the notion of only some state for a client being
> revoked. (I've looked at that one a bit and it seems to get quite
> challenging. Maybe someday I'll come up with a simple scheme I'm convinced
> works for that case.) 

another strength of this design is that the clientid direcories can easily be 
populated with files containg additional info. for example, we plan on adding 
a file to hold SETCLIENTID principal info.

-->Andy



_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:13:54 AM Z CST