Generate Random Sample or Permutation

DESCRIPTION:

Produces a random balanced sample, with minimal replacement

USAGE:

balancedSample(n, size = n, prob = NULL, full.partition = "none") 

REQUIRED ARGUMENTS:

n
population size.

OPTIONAL ARGUMENTS:

size
sample size. A sample of size size is generated from values 1:n.
prob
vector of probabilities of length n, or NULL (indicating equal probabilities). The vector is normalized internally to sum to one. Index i is chosen with probability size*prob[i].
full.partition
character, one of "first", "last", or "none"; Return the initial (if "first") or final (if "last") size elements of a full sample of size n. If "none", do not generate a full sample. Valid only if size < n; ignored otherwise. See below.

VALUE:

vector of length size of indices drawn without replacement if possible, i.e. if size<=n and size*max(prob)<=1. In particular, the default values size=n, prob=NULL generate a simple permutation of 1:n.

Otherwise samples are with as little replacement as possible -- actual frequencies are rounded up or down from the goal size/n or size*prob, and the probability that result[i]=j is 1/n or prob[j].

SIDE EFFECTS:

This function causes creation of the object .Random.seed if it does not already exist. Otherwise its value is updated.

DETAILS:

The algorithm when prob is supplied uses a random permutation, systematic sampling, and a final random permutation. prob is randomly permuted, together with the indices 1:n. The interval (0,1) is divided into n subintervals I[i] with length(I[i]) proportional to prob[i]. Next, size values u[j] are generated uniform on the interval (0,1) using systematic sampling. Let u[1] be random uniform on (0,1/size) and u[j] = u[j-1] + 1/size , for j in 2:size. Thus the u[j] all lie in (0,1) and are equally spaced. If u[j] is in interval I[i] then the jth component of a temporary result is the ith permuted index. At this point, if I[i] has length greater than k/size then there are k or more consecutive copies of the ith permuted index in the temporary result. A final random permutation of the result ensures that repeats do not always appear together.

Calling balancedSample with full.partition = "first", size = m and then (after re-setting the seed) full.partition = "last", size = n-m produces complementary indices which, when concatenated together, produce results equivalent to calling balancedSample with size = n: with probs present, the concatenated results are the same, up to a permutation, as the results with size = n; with no probs, the concatenated results are identical to the results with size = n.

SEE ALSO:

returns multiple samples using the same algorithm, uses this to sample a vector, matrix or data frame, .

EXAMPLES:

balancedSample(4)    # random permutation 
balancedSample(4, 2) # two observations chosen without replacement 
balancedSample(4, 6) # each observation once, two observations twice 
balancedSample(4, 8) # each observation twice 
balancedSample(4, 8, prob=(1:4)) # expected frequencies .8, 1.6, 2.4, 3.2 
 
# These are equivalent (in the long run; they vary randomly) 
balancedSample(5, 100) 
sample(rep(1:5, length=100), 100, replace=F)