[nfsv4] Re: stable storage for server restart

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Spencer Shepler (spencer.shepler@sun.com)
Date: 02/22/05-02:39:37 AM Z


Date: Tue, 22 Feb 2005 00:39:37 -0800
From: Spencer Shepler <spencer.shepler@sun.com>
Message-ID: <20050222083937.GF130156@jurassic.eng.sun.com>
Subject: [nfsv4] Re: stable storage for server restart


I wont' go through point by point for Solaris but the approach
is similar to what Andy describes for Linux.  The Solaris server
has a single directory with a file entryfor each clientid.
The filenames are ip-address/clientid with the contents being
the full client supplied clientid.  This offers a "quick" way
to determine which clients are active with a server.

The same type of "move state directory on server recovery" methodology
is used and the methodology will survive the cascading failure issue.

Spencer

On Mon, William A.(Andy) Adamson wrote:
> rick@snowhite.cis.uoguelph.ca said:
> > (I cc'd nfsv4@ietf.org, just in case anyone not on the linux list is
> > interested. Apologies in advance for cluttering up your email. Also, I'd be
> > interested in hearing any thoughts others have on the design.)
> 
> i'm also interested in thoughts on our design for the linux server which is 
> similar.
> 
> instead of appending a recovery file we use the 'file system as a data base' 
> approach, populating a (configurable) recovery directory with one directory 
> per clientid. the clientid directory name is the md5 hash of the client 
> supplied client id which can be of length 1024. the md5 hash is calculated at 
> SETCLIENTID, and we return CLIENTID_IN_USE for md5 cache hits, which should be 
> negligible. no upcalls are involed, all operations are done in-kernel.
>  
> >   - at server startup (before any Compounds are performed), the log is
> >     read and an in-memory structure is created, indicating what clients
> >     can reclaim state
> 
> we read the recovery directory, populating in-memory structures indicating 
> which clients can reclaim state.
> 
> >  - during reclaim, the in-memory structure is used to check for Grace and
> >     is marked for successful reclaims, per client
> 
> we do the same.
> 
> > * - at the end of the grace period, the file is truncated to 0 length and
> >     a new append log is written from the in-memory structure, with one record
> >     for each client that successfully reclaimed some state
> 
> at the end of the grace period, we remove the clientid directories from the 
> recovery directory for those clients that did not reclaim state.
> 
> >  - then normal, non-grace operation starts...
> >     - records are appended to the log when a client acquires the first state
> >       (first Open) after a SetClientID and when state is revoked for a
> >       client (I do not support revocation of only some state for a client)
> >       (nb: The first Open records are only done for clients that didn't
> >        successfully reclaim during grace.) 
> 
> normal, non-grace operation starts:
>   - we add a clientid directory to the recovery directory when a client makes 
> their first successful open confirm. we remove a clientid directory from the 
> list whenever either its lease expires or admin action removes the client 
> state.
> 
> > - I had to lock the other nfsd threads out when updating stable storage. The
> >   reason was:
> 
> for us this is all auto-magically handled by exitsing directory operations 
> (mkdir,rmdir), and our nfs state lock.
> 
> >   - I needed to record the revocation before issuing conflicting lock state,
> >     and I wanted to avoid races between multiple clients trying to acquire
> >     conflicting locks while the write(s) to disk were in progress. Since
> >     revocation is a rare event, I didn't see this as a serious performance
> >     hit.
> >   - For the case of first Open, the record indicates successful lock
> >     state acquisition. If another client acquires a conflicting lock
> >     while the disk write(s) for the log are in progress, there would be
> >     a record indicating that the client had successfully acquired state
> >     although the lock failed, due to a conflict. Is this actually a
> >     problem? I'm not convinced it is, but my code "plays it safe" for now.
> >     I could see this being a significant performance hit, if lots of new
> >     clients did SetClientIDs followed by Opens at the same time. (Ones
> >     that haven't already reclaimed locks at server restart.)
> >   - The other two cases (when server first starts up and at end-of-grace)
> >     only occur once per server reboot and only add a little time to the
> >     grace period. 
> >
> > I think the weakest part of this design is that, if the server crashes again
> > while at "*", the append only log is not complete (possibly empty). This will
> > result in clients not being allowed to reclaim, that otherwise should be able
> > to (ie. no entry->no reclaim->NFS4ERR_NOGRACE for all reclaim requests).
> 
> when we crash at *, our recovery directory has all the clients that have just 
> successfully reclaimed state, plus potentially some that could have reclaimed 
> state, but didn't during the last grace period. so, i feel ok about crashing 
> at * and recovering with the data in the recovery directory.
> 
> > The append log will grow, but I only see a problem if clients go hogwild with
> > SetClientIDs. It does get truncated when the server restarts, so a sysadmin
> > can just reboot when it gets too big:-)
> 
> our recovery directory only holds active clients; this is strength of our 
> design.
> 
> > It also doesn't support the notion of only some state for a client being
> > revoked. (I've looked at that one a bit and it seems to get quite
> > challenging. Maybe someday I'll come up with a simple scheme I'm convinced
> > works for that case.) 
> 
> another strength of this design is that the clientid direcories can easily be 
> populated with files containg additional info. for example, we plan on adding 
> a file to hold SETCLIENTID principal info.
> 
> -->Andy
> 
> 
> 
> _______________________________________________
> NFSv4 mailing list
> NFSv4@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:13:55 AM Z CST