Cache File Temporary Directory

DESCRIPTION:

Sets and retrieves the directory used for creating temporary cache files. Assigning a big data object to a variable may copy data cache files for the object. Setting the temporary directory appropriately may avoid some cache file copies.

This function requires the bigdata library section to be loaded.

USAGE:

bd.cache.temp.dir(dir="")

OPTIONAL ARGUMENTS:

dir
The new directory file path. If dir is an empty string, the cache file directory will not be changed. Currently, the cache file directory path must end with the directory name "__bdo". Therefore, if dir doesn't end with "__bdo", it will be added at the end of dir. As a special case, if dir is a directory containing a directory ".Data", then ".Data/__bdo" is added at the end of dir. This allows passing an S-PLUS chapter such as searchPaths()[1] as the value of dir, to specify the associated "__bdo" directory.

VALUE:

The previous value of the cache file temporary directory path.

SIDE EFFECTS:

If the dir argument is not an empty string, change the cache file temporary directory.

DETAILS:

A big data object is represented by several "cache files" containing the data and meta-data for the object. In the current implementation of big data objects, if a database variable stored in "somepath/.Data" contains a big data object, the cache files for that object must be stored in "somepath/.Data/__bdo".

This raises the question of where to write the cache files when initially creating a big data object. When a big data object is created, there is no way to know whether it will eventually be stored in a database variable, and where that variable will be written. The solution is to create the cache files in the "cache file temporary directory". If a variable containing a big data object is later assigned to different directory, the cache files are copied.

When the big data library is loaded, the cache file temporary directory is initialized to the "__bdo" directory within the first database on the search path. Most new variables are stored in this database, so most of the time it is not necessary to copy any big data cache files: they are created in the right directory.

If you are attaching and detaching databases and assigning variables in multiple databases, it is possible for performance to suffer because of time spent copying big data cache files. In some situations, using bd.cache.temp.dir to change the cache file temporary directory may help. For example, if you are performing a series of big data operations and assigning the results as variables within a new database, it may help to call bd.cache.temp.dir so that new big data caches will be created within the new database.

One way to detect that you may have excessive copying of cache files is to call , and see if the copy.bytes value is large.

bd.cache.temp.dir should be used with care. If you set it to a directory where you cannot write files, due to file protections or disk full problems, big data operations may fail.

SEE ALSO:

EXAMPLES:

# retrieve current value of directory
bd.cache.temp.dir()
# set to the first database in the search path
bd.cache.temp.dir(searchPaths()[1])