balancedSample(n, size = n, prob = NULL, full.partition = "none")
size is generated from values
1:n.
n, or
NULL (indicating equal
probabilities).
The vector is normalized internally to sum to one.
Index
i is chosen with probability
size*prob[i].
"first",
"last", or
"none"; Return the initial
(if
"first") or final (if
"last")
size elements of a full
sample of size
n. If
"none", do not generate a full sample. Valid
only if
size < n; ignored otherwise. See below.
size of indices drawn without replacement
if possible, i.e. if
size<=n and
size*max(prob)<=1.
In particular, the default values
size=n,
prob=NULL
generate a simple permutation of
1:n.
Otherwise samples are with as little replacement as possible -- actual
frequencies are rounded up or down from the goal
size/n or
size*prob,
and the probability that
result[i]=j is
1/n or
prob[j].
.Random.seed
if it does not already exist. Otherwise its value is updated.
The algorithm when
prob is supplied uses a random permutation,
systematic sampling, and a final random permutation.
prob
is randomly permuted, together with the indices
1:n.
The interval (0,1) is divided into
n subintervals
I[i] with
length(I[i])
proportional to
prob[i].
Next,
size values
u[j] are generated uniform on the interval
(0,1) using systematic sampling.
Let
u[1] be random uniform on
(0,1/size) and
u[j] = u[j-1] + 1/size
, for
j in
2:size. Thus the
u[j]
all lie in (0,1) and are equally spaced.
If
u[j] is in interval
I[i] then the
jth component of
a temporary result is the
ith permuted index.
At this point, if
I[i] has length greater than
k/size then
there are
k or more consecutive copies of the
ith permuted index in
the temporary result.
A final random permutation of the result
ensures that repeats do not always appear together.
Calling
balancedSample with
full.partition = "first", size = m and then
(after re-setting the seed)
full.partition = "last", size = n-m produces
complementary indices which, when concatenated together, produce
results equivalent to calling
balancedSample with
size = n: with
probs present, the concatenated results are the same, up to a
permutation, as the results with
size = n; with no
probs, the
concatenated results are identical to the results with
size = n.
balancedSample(4) # random permutation balancedSample(4, 2) # two observations chosen without replacement balancedSample(4, 6) # each observation once, two observations twice balancedSample(4, 8) # each observation twice balancedSample(4, 8, prob=(1:4)) # expected frequencies .8, 1.6, 2.4, 3.2 # These are equivalent (in the long run; they vary randomly) balancedSample(5, 100) sample(rep(1:5, length=100), 100, replace=F)