Measure Internal Bigdata Operations.

DESCRIPTION:

Read internal counters of the number of bigdata operations executed and the number of bytes read and written during these operations. This information can be used to pinpoint bigdata operations that may be causing performance problems.

This function requires the bigdata library section to be loaded.

USAGE:

bd.tally(expr=NULL, reset=F)

OPTIONAL ARGUMENTS:

expr
If this argument is given, it should be an expression that will be evaluated. In this case, the returned vector will be the difference of the internal counters after and before evaluating the expression.
reset
If this argument is true, the internal counters are reset to zero.

VALUE:

A named numeric vector, with the following elements:

"ms": The current time in milliseconds. This will be zero the first time bd.tally is called. This is reset to zero when the reset argument is true.

pipelines: The number of bigdata "pipelines" that have been executed. Currently each pipeline contains a single "node". Normally, this element is the nodes value plus the dc.nodes value.

errors: The number of bigdata pipelines that have terminated with an error.

nodes: The number of normal nodes that have been executed. Most simple bigdata operations are implemented with normal nodes.

dc.nodes: The number of "data cache" nodes that have been executed. These nodes are implemented differently than normal nodes.

splus.scripts: The number of nodes executed that run Splus scripts. Functions such as bd.block.apply run Splus scripts.

sorts: The number of sorting operations performed.

blocks: The number of data blocks processed by normal nodes.

in.bytes: The number of bytes read from file caches. If a given file cache is read repeatedly in multiple passes, this number will only include the bytes for one read pass.

out.bytes: The number of bytes written to file caches.

scan.bytes: Some data cache nodes need to scan output data caches to collect statistics. This value is the number of bytes read by data cache nodes for this purpose.

sort.bytes: The number of bytes written during sort operations. This may be much larger than the final sorted data, since it includes temporary files written while sorting large datasets.

s2b.bytes: The number of bytes transferred from small data objects to bigdata objects, via functions such as as.bdFrame.

b2s.bytes: The number of bytes transferred from big data objects to small data objects, via functions like bd.coerce. Note that printing a bdFrame or bdVector extracts part of a big data object, and converts it to a small data object.

copy.bytes: The number of bytes copied between database directories when assigning variables to a different database. If this is excessive, you may be able to reduce it by calling .

DETAILS:

This function is used to examine the performance of bigdata operations. If an expression EXPR is taking an exceptional amount of time, it may be useful to execute bd.tally(EXPR), and review the output value, to see whether the expression is executing a large number of bigdata nodes, or reading or writing many bytes.

EXAMPLES:

bd.tally(as.bdFrame(fuel.frame))