Analyze BDO Cache Files

DESCRIPTION:

Analyze a directory containing big data cache files, and return information about cache files, references counts, and unknown files.

This function requires the bigdata library section to be loaded.

USAGE:

bd.cache.info(where=1)
bd.cache.cleanup(where=1, caches=T, files=T, registered=F)

OPTIONAL ARGUMENTS:

where
identifies the S-PLUS database whose __bdo subdirectory will be analyzed or cleaned.
caches
if true, bd.cache.cleanup deletes all unknown cache files in the cache directory.
files
if true, bd.cache.cleanup deletes all unknown files in the cache directory, other than unknown cache files.
registered
if true, bd.cache.cleanup unregisters any registered memory references to cache files. Warning: this may invalidate caches that are in use. It should be safe to call this from the top-level prompt.

VALUE:

bd.cache.info returns a list with the following elements:
cache.file.directory

the file name of the directory containing the cache files.

var.caches
a list of all variables containing big data objects. The list names are the variable names. The list elements are vectors of the caches accessed with each variable. Variable names such as "???__4" indicate variables where we can't find the variable name.

ref.counts
a named vector giving the reference count for each cache.

mem.known
a character vector of all of the cache names with current known references.

mem.filerefs
a numeric vector giving the number of references from any saved variables to each of the known caches.

mem.memrefs
a numeric vector giving the number of references from in-memory objects to each of the known caches.

mem.viewrefs
a numeric vector giving the number of references from data viewers to each of the known caches.

mem.new
a character vector giving any caches that have been newly created, but have no in-memory reference. This is normally empty.

mem.registered
a character vector giving any caches that have been registered, indicating that they have an in-memory reference.

mem.registered.loc
a numeric vector giving, for each registered cache, a memory location used to identify it.

unknown.caches
all caches in the cache directory that are not in the reference count data. A file is assumed to be a cache file if it has one of the extensions dcf, dbf, or mdf.

unknown.files
all files in the cache directory that are not caches or other known files. If any files appear here, this usually indicates that some operation is creating temporary files and not deleting them.

DETAILS:

Big data objects store their (potentially huge) data in external cache files. These cache files are stored in a __bdo subdirectory within the .Data directory where the objects that reference them are stored. There is an automatic garbage-collection system that deletes these cache files when they are no longer needed. The bd.cache.info function is used to analyze the files within the __bdo subdirectory. It is primarily used when developing and debugging the garbage collection system. The bd.cache.cleanup can be used to clean up cache files that have not been deleted by the garbage collection system. This is most likely to occur if the entire system crashes.

EXAMPLES:

## analyze DB 1
bd.cache.info()