This function requires the bigdata library section to be loaded.
bd.cache.temp.dir(dir="")
dir
is an empty string, the cache file directory will not be changed.
Currently, the cache file directory path must end with the directory name "__bdo".
Therefore, if
dir
doesn't end with "__bdo",
it will be added at the end of
dir
.
As a special case, if
dir
is a directory containing a directory ".Data",
then ".Data/__bdo" is added at the end of
dir
.
This allows passing an S-PLUS chapter such as
searchPaths()[1]
as the value of
dir
, to specify the associated "__bdo" directory.
dir
argument is not an empty string,
change the cache file temporary directory.
A big data object is represented by several "cache files" containing the data and meta-data for the object. In the current implementation of big data objects, if a database variable stored in "somepath/.Data" contains a big data object, the cache files for that object must be stored in "somepath/.Data/__bdo".
This raises the question of where to write the cache files when initially creating a big data object. When a big data object is created, there is no way to know whether it will eventually be stored in a database variable, and where that variable will be written. The solution is to create the cache files in the "cache file temporary directory". If a variable containing a big data object is later assigned to different directory, the cache files are copied.
When the big data library is loaded, the cache file temporary directory is initialized to the "__bdo" directory within the first database on the search path. Most new variables are stored in this database, so most of the time it is not necessary to copy any big data cache files: they are created in the right directory.
If you are attaching and detaching databases and assigning variables in
multiple databases, it is possible for performance to suffer because of
time spent copying big data cache files.
In some situations, using
bd.cache.temp.dir
to
change the cache file temporary directory may help.
For example, if you are performing a series of big data operations
and assigning the results as variables within a new database,
it may help to call
bd.cache.temp.dir
so that
new big data caches will be created within the new database.
One way to detect that you may have excessive copying of cache files is to call
,
and see if the
copy.bytes
value is large.
bd.cache.temp.dir
should be used with care.
If you set it to a directory where you cannot write files,
due to file protections or disk full problems,
big data operations may fail.
# retrieve current value of directory bd.cache.temp.dir()
# set to the first database in the search path bd.cache.temp.dir(searchPaths()[1])