Re: [nfsv4] How_much_to_cache

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Michael E. Thomadakis (miket@hellas.tamu.edu)
Date: 01/14/05-02:06:37 AM Z


Message-ID: <41E77D8D.7050204@hellas.tamu.edu>
Date: Fri, 14 Jan 2005 02:06:37 -0600
From: "Michael E. Thomadakis" <miket@hellas.tamu.edu>
Subject: Re: [nfsv4] How_much_to_cache

Prasanna Wakhare wrote:

>Hi all,
>I have a simple question in mind. I want to introduce client side
>caching in any network or cluster file system. The caches are
>in-memory cache not secondary cache as in AFS but as sprite.
>My question is all the cache is in kernel space and if large file
>happens to mmap or read/write say 2 GB etc.
>What would be consequence if there is great limit on my cache.
>We know kernel space virtual address are from 3GB to 4GB. If i start
>allocating cache block as soon as i'm getting file from
>storage node and if my cache size is 5GB then this will hang the
>system as kernel cant have that much addresses refer to. And in that
>case we need either CPU to address 64 bit or something like that.
>But even then there is limit on cacheing the file data.
>So how much shall i keep my cache size to have best performance.
>I hope i'm pretty clear in what i asked?
>Thanks
>Prasanna
>
>  
>

This service is very similar to the 'buffer cache' service in the kernel 
where file system data are treated as VM pages. UNIX and increasingly 
Linux are using VM to buffer file pages.

In my oppinion there should be a method that a user can communicate 
performance 'hints' to the kernel as to the behavior of a file or the 
entire file system. You have the entire spectrum of control here: fixed 
to adaptive.  'Hints' are a good idea when a particular file or file 
system has a known / expected behavior or service requirements. However 
there should always evaluated in the context of the current resource 
availability. Assuming large file sizes, let me just give a few simple 
examples.

- A user scans a file from beginning to end and never rewrites it or 
modifies it. Thus, read-ahead could be of benefit. The question then is 
how much to read ahead, if the user does not know this value. One way to 
answer this is to monitor within a (sliding) window of time the request 
pattern of the process. The disclaimer is that recent past is not always 
a good indicator of the near future, but again, in many useful cases it 
is. Given the read once pattern, there is no reason to maintain the 
buffer cache pages for this file.

- A user writes always at the end of a file (appending to it) as is the 
case of log files or results from computation/reports, etc. Output 
buffering is of benefit and caching is not needed.

- A process that deviates from the two ideal ones above is one that 
re-reads or (worse) re-writes a portion of the file it read previously. 
Existing VM caching algorithms can be used here for the benefit of the 
system (LRU page replacement, etc).

- It would be interesting of a user could specify a heuristic or at 
least hint values (such as read ahead counts or 'release behind') at 
mount time for the entire file system.

In general I, don't think that one needs to cache entire files in the 
limited kernel memory space. This open  a pandora's box with all other 
system and user activities suffering degradation and then eventually 
leading to VM thrashing.  Notice though that the above discussion 
becomes immaterial for the small files, so one parameter of interest may 
be how much (in KBs) to cache for a file. A more interesting parameter 
is the total number of real memory pages to dedicate to file caching. 
You will be very thankful later if you can enforce a true upper bound on 
this.

Finally, since all heuristics can be defeated by certain request 
patterns, the meta-heuristic should be to guard against overusing scarce 
or bottleneck resources, which ever those may be at different times. In 
short, it is worthwhile pursuing decent heuristics but you should always 
monitor the system to avoid forming bottlenecks. 

-MT

>_______________________________________________
>nfsv4 mailing list
>nfsv4@ietf.org
>https://www1.ietf.org/mailman/listinfo/nfsv4
>  
>


_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:13:51 AM Z CST