Re: [nfsv4] How_much_to_cache

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Erblichs (erblichs@earthlink.net)
Date: 01/14/05-04:40:12 PM Z


Message-ID: <41E84A4C.CCAA7E14@earthlink.net>
Date: Fri, 14 Jan 2005 14:40:12 -0800
From: Erblichs <erblichs@earthlink.net>
Subject: Re: [nfsv4] How_much_to_cache

Ok,

	Maybe some of this will help.

	1) are you implimenting write-thru caching
	or write behind caching? This will determine
	the consistency of the cached data if a
	crash occurs?

	2) can you cache data temporally so a disconnect
	from a server will not cause service issues to
	the full set of data? This can also be applied
	to CD ROM src files, and then modified and not
	restored to the original location

	3) Do you accept holes (discontigous sections)
	within your file object?

	4) Lastly, what are the access latencies for
	   read operations?

	Ok, with all that said, with a WAN env the access
	latencies can approach at max 150ms. With GBs
	of memory allocated for file operations, a rule
	of thumb for me in the past was anything less than
	16MBs of file gets read. This for me is the largest
	working set size (MMU dependent).

	Most files (cached objects) will be signifianctly
	less than that, so if the total amount of memory
	consumed is less, the main question is aging. When
	do you consider the data no longer useful to even
	yourself (assuming that you did no modifications)
	and discard the cached contents?

	If the cache was a write-back, every 15-30 secs,
	cached data was marked for flushing back to the
	original src location.

	Mitchell Erblich
	----------------
	   

"Michael E. Thomadakis" wrote:
> 
> Prasanna Wakhare wrote:
> 
> >Hi all,
> >I have a simple question in mind. I want to introduce client side
> >caching in any network or cluster file system. The caches are
> >in-memory cache not secondary cache as in AFS but as sprite.
> >My question is all the cache is in kernel space and if large file
> >happens to mmap or read/write say 2 GB etc.
> >What would be consequence if there is great limit on my cache.
> >We know kernel space virtual address are from 3GB to 4GB. If i start
> >allocating cache block as soon as i'm getting file from
> >storage node and if my cache size is 5GB then this will hang the
> >system as kernel cant have that much addresses refer to. And in that
> >case we need either CPU to address 64 bit or something like that.
> >But even then there is limit on cacheing the file data.
> >So how much shall i keep my cache size to have best performance.
> >I hope i'm pretty clear in what i asked?
> >Thanks
> >Prasanna
> >
> >
> >
> 
> This service is very similar to the 'buffer cache' service in the kernel
> where file system data are treated as VM pages. UNIX and increasingly
> Linux are using VM to buffer file pages.
> 
> In my oppinion there should be a method that a user can communicate
> performance 'hints' to the kernel as to the behavior of a file or the
> entire file system. You have the entire spectrum of control here: fixed
> to adaptive.  'Hints' are a good idea when a particular file or file
> system has a known / expected behavior or service requirements. However
> there should always evaluated in the context of the current resource
> availability. Assuming large file sizes, let me just give a few simple
> examples.
> 
> - A user scans a file from beginning to end and never rewrites it or
> modifies it. Thus, read-ahead could be of benefit. The question then is
> how much to read ahead, if the user does not know this value. One way to
> answer this is to monitor within a (sliding) window of time the request
> pattern of the process. The disclaimer is that recent past is not always
> a good indicator of the near future, but again, in many useful cases it
> is. Given the read once pattern, there is no reason to maintain the
> buffer cache pages for this file.
> 
> - A user writes always at the end of a file (appending to it) as is the
> case of log files or results from computation/reports, etc. Output
> buffering is of benefit and caching is not needed.
> 
> - A process that deviates from the two ideal ones above is one that
> re-reads or (worse) re-writes a portion of the file it read previously.
> Existing VM caching algorithms can be used here for the benefit of the
> system (LRU page replacement, etc).
> 
> - It would be interesting of a user could specify a heuristic or at
> least hint values (such as read ahead counts or 'release behind') at
> mount time for the entire file system.
> 
> In general I, don't think that one needs to cache entire files in the
> limited kernel memory space. This open  a pandora's box with all other
> system and user activities suffering degradation and then eventually
> leading to VM thrashing.  Notice though that the above discussion
> becomes immaterial for the small files, so one parameter of interest may
> be how much (in KBs) to cache for a file. A more interesting parameter
> is the total number of real memory pages to dedicate to file caching.
> You will be very thankful later if you can enforce a true upper bound on
> this.
> 
> Finally, since all heuristics can be defeated by certain request
> patterns, the meta-heuristic should be to guard against overusing scarce
> or bottleneck resources, which ever those may be at different times. In
> short, it is worthwhile pursuing decent heuristics but you should always
> monitor the system to avoid forming bottlenecks.
> 
> -MT
> 
> >_______________________________________________
> >nfsv4 mailing list
> >nfsv4@ietf.org
> >https://www1.ietf.org/mailman/listinfo/nfsv4
> >
> >
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:13:51 AM Z CST