From: Erblichs (erblichs@earthlink.net)
Date: 01/14/05-04:40:12 PM Z
Message-ID: <41E84A4C.CCAA7E14@earthlink.net> Date: Fri, 14 Jan 2005 14:40:12 -0800 From: Erblichs <erblichs@earthlink.net> Subject: Re: [nfsv4] How_much_to_cache Ok, Maybe some of this will help. 1) are you implimenting write-thru caching or write behind caching? This will determine the consistency of the cached data if a crash occurs? 2) can you cache data temporally so a disconnect from a server will not cause service issues to the full set of data? This can also be applied to CD ROM src files, and then modified and not restored to the original location 3) Do you accept holes (discontigous sections) within your file object? 4) Lastly, what are the access latencies for read operations? Ok, with all that said, with a WAN env the access latencies can approach at max 150ms. With GBs of memory allocated for file operations, a rule of thumb for me in the past was anything less than 16MBs of file gets read. This for me is the largest working set size (MMU dependent). Most files (cached objects) will be signifianctly less than that, so if the total amount of memory consumed is less, the main question is aging. When do you consider the data no longer useful to even yourself (assuming that you did no modifications) and discard the cached contents? If the cache was a write-back, every 15-30 secs, cached data was marked for flushing back to the original src location. Mitchell Erblich ---------------- "Michael E. Thomadakis" wrote: > > Prasanna Wakhare wrote: > > >Hi all, > >I have a simple question in mind. I want to introduce client side > >caching in any network or cluster file system. The caches are > >in-memory cache not secondary cache as in AFS but as sprite. > >My question is all the cache is in kernel space and if large file > >happens to mmap or read/write say 2 GB etc. > >What would be consequence if there is great limit on my cache. > >We know kernel space virtual address are from 3GB to 4GB. If i start > >allocating cache block as soon as i'm getting file from > >storage node and if my cache size is 5GB then this will hang the > >system as kernel cant have that much addresses refer to. And in that > >case we need either CPU to address 64 bit or something like that. > >But even then there is limit on cacheing the file data. > >So how much shall i keep my cache size to have best performance. > >I hope i'm pretty clear in what i asked? > >Thanks > >Prasanna > > > > > > > > This service is very similar to the 'buffer cache' service in the kernel > where file system data are treated as VM pages. UNIX and increasingly > Linux are using VM to buffer file pages. > > In my oppinion there should be a method that a user can communicate > performance 'hints' to the kernel as to the behavior of a file or the > entire file system. You have the entire spectrum of control here: fixed > to adaptive. 'Hints' are a good idea when a particular file or file > system has a known / expected behavior or service requirements. However > there should always evaluated in the context of the current resource > availability. Assuming large file sizes, let me just give a few simple > examples. > > - A user scans a file from beginning to end and never rewrites it or > modifies it. Thus, read-ahead could be of benefit. The question then is > how much to read ahead, if the user does not know this value. One way to > answer this is to monitor within a (sliding) window of time the request > pattern of the process. The disclaimer is that recent past is not always > a good indicator of the near future, but again, in many useful cases it > is. Given the read once pattern, there is no reason to maintain the > buffer cache pages for this file. > > - A user writes always at the end of a file (appending to it) as is the > case of log files or results from computation/reports, etc. Output > buffering is of benefit and caching is not needed. > > - A process that deviates from the two ideal ones above is one that > re-reads or (worse) re-writes a portion of the file it read previously. > Existing VM caching algorithms can be used here for the benefit of the > system (LRU page replacement, etc). > > - It would be interesting of a user could specify a heuristic or at > least hint values (such as read ahead counts or 'release behind') at > mount time for the entire file system. > > In general I, don't think that one needs to cache entire files in the > limited kernel memory space. This open a pandora's box with all other > system and user activities suffering degradation and then eventually > leading to VM thrashing. Notice though that the above discussion > becomes immaterial for the small files, so one parameter of interest may > be how much (in KBs) to cache for a file. A more interesting parameter > is the total number of real memory pages to dedicate to file caching. > You will be very thankful later if you can enforce a true upper bound on > this. > > Finally, since all heuristics can be defeated by certain request > patterns, the meta-heuristic should be to guard against overusing scarce > or bottleneck resources, which ever those may be at different times. In > short, it is worthwhile pursuing decent heuristics but you should always > monitor the system to avoid forming bottlenecks. > > -MT > > >_______________________________________________ > >nfsv4 mailing list > >nfsv4@ietf.org > >https://www1.ietf.org/mailman/listinfo/nfsv4 > > > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4
This archive was generated by hypermail 2.1.2 : 03/04/05-02:13:51 AM Z CST