Manage Compute-Intensive Iteration

DESCRIPTION:

Run a new S-PLUS process in which each step of a for loop is done as a separate top-level expression, or create a file of top-level expressions corresponding to steps in a loop. This file is suitable for use with Splus SBATCH,

Note: This function is deprecated. For more information, see The For function will not work in the S+Workbench unless it is called with exec=F.

USAGE:

For(steps, body, first, wait=T, sync=T, quit=T, grain.size=1,
    debug=F, exec=T) 

REQUIRED ARGUMENTS:

steps
an expression giving the steps over which to loop,
body
an expression forming the body of the loop.

OPTIONAL ARGUMENTS:

first
optional expression to be evaluated, once, before the start of the loop calculations, in the new S-PLUS process, or BATCH job. A typical use of this might be to invoke a graphics device function if the iteration in the loop involved plotting.
wait
should the current S-PLUS process wait while the new process is spawned? If you want to use the results, you need to wait. On the other hand, for truly large computations, you may be happier to spawn a background job.

WARNING: It's generally a bad idea to spawn a large number of S-PLUS processes simultaneously.

The notion of firing off many calls to For() with wait=F is likely to slow down the machine and (on a multi-user system) make you unpopular.
sync
should S-PLUS synchronize its assignments before spawning running the new process? Often it doesn't matter. If there were some permanent assignments in the same top-level expression as the call to For(), synchronization must be done for the new process to see these assignments. Note that in any case, synchronize(1) will be done after the new process completes; otherwise, you wouldn't see any of the results. Do not, however, rely on making permanent assignments within functions and having those assignments synchronized using calls to synchronize(1). Such calls are executed only at the end of each top-level expression. (If this makes no sense to you, ignore it and all will likely be well.)
quit
should the spawned process or BATCH job quit if it encounters an error? If TRUE, the behavior is more similar to an ordinary for loop in that no steps will be taken after an error occurs. If FALSE, execution will continue regardless of errors. (In either case, note that assignments in steps of the iteration before the error will be committed, whereas they would not be with a standard for loop.)
grain.size
a positive integer giving the number of copies of the body expression to clump into one top level expression. There is a fair bit of disk activity at the beginning and end of a top level expression, which can add quite a bit of overhead to your function. If grain.size is too large, the subprocess or BATCH job may run out of memory; if grain.size is too small the subprocess or BATCH job will take a long time to run.
debug
If TRUE show (with the page function) the file of S-PLUS commands before it is fed to the subprocess. page() generally uses the less utility, which lets you type v to edit this file.
exec
If TRUE spawn another S-PLUS process which executes the commands in the command file generated by this function, then delete the file. If FALSE just make the command file, do not delete it, and return the name of the command file. Then you can run S-PLUS with this command file as its input, for instance, with Splus BATCH. The default value for exec is TRUE on multitasking operating systems (like the various flavors of Unix) and FALSE on non-multitasking operating systems (like DOS).

VALUE:

If exec is TRUE, this function returns NULL. If exec is FALSE, this function returns the name of the command file.

SIDE EFFECTS:

If exec is FALSE, creates a file of top-level S-PLUS commands corresponding to the steps in an S-PLUS for loop.

DETAILS:

Consider the following expression:

for(i in 1:10) fits[[i]] <- lm(diddle(y) \~ x)

This does a computation at each step of a loop and stores the results as the elements of a list, fits. The For function will carry out the same calculations as a result of the call:

For(i=1:10, fits[[i]] <- lm(diddle(y) \~ x))

For a quick loop, there is no advantage to For; on the contrary, it will be slower because it must start a new process. For very large computations, on the other hand, running each iteration separately can use less memory and perhaps execute faster as a result. The first two arguments to For() can have any names or no names; these arguments are always the first two arguments, positionally, in the call. As the example shows, the name of the first argument is the name of the loop index. It can appear in the expression in the second argument, just as it can appear in the body of an ordinary for loop.

Here are some special techniques and semantic details. Two extensions to the capabilities of ordinary for loops are given by the naming options. If the body argument is named, then the results of each step in the loop will be automatically stored as corresponding elements in the object of that name. The example above could have been written equivalently as

For(i=1:10, fits = lm(diddle(y) \~ x))

In the other direction, if you don't need to refer to the loop variable in the body, you don't need to name it, so a further reduction of the example would be:

For(1:10, fits = lm(diddle(y) \~ x))

There will be no automatic printing after each step in the loop. The value of the call to For() itself is NULL, not the value of the last iteration in the loop. Generally, user interaction with the new process is unpredictable and would best be avoided if it isn't necessary. Some things should work, like graphical interaction. However, reading from standard input probably is a bad idea, since both the current S-PLUS process and the new one have the same standard input.

Data sets used in in For() must be permanent data sets, not variables in a function.

If you name your steps expression, For() will create and remove a variable named .Steps in your working directory, destroying any variable named .Steps you may have there. If you name your steps expression, a variable with that name will be created in your working directory and will contain the element of steps used in the last successfully completed body expression.

For() is somewhat experimental and you may expect changes to it in the future.

EXAMPLES:

# do a plot each time, start up a device driver first 
For(i = 1:10, qqnorm(fits[[i]]$resid), first = postscript()) 
# demonstrate effect of grain.size 
x <- numeric(100) ; d <- rnorm(10) - rnorm(10) + 1 
print(unix.time(For(i = 1:100, x[i] <- mean(sign(runif(10)-.5)*d), 
      grain.size = 1))) 
# [1]  0.5333333  0.4000001 46.0000000  4.6500001  7.5666666 
print(unix.time(For(i = 1:100, x[i] <- mean(sign(runif(10)-.5)*d), 
      grain.size = 10))) 
# [1]  0.5333333  0.3333335 10.0000000  2.8499985  2.2999992