Order a Multicolumn Real Matrix into a Quad Tree

DESCRIPTION:

The quad.tree function performs a recursive partitioning of a numeric matrix, returning row and column index vectors and a list of medians which may be used to sort the matrix. The quad tree object is subsequently used in a nearest neighbor search of the matrix (see find.neighbor ).

USAGE:

quad.tree(x, bucket.size=1) 

REQUIRED ARGUMENTS:

x
the numeric matrix from which the quad tree is constructed.

OPTIONAL ARGUMENTS:

bucket.size
the maximum size for any leaf on the quad tree. This parameter affects the computational time for neighbor search algorithms. The larger the bucket size, the larger the number of observations which need to be examined when performing a search for nearest neighbors on the quad tree. On the other hand, larger bucket sizes may mean that fewer leaves need to be considered. A bucket size of, say, five seems acceptable in many circumstances, though our default of 1 provides a safe value.

VALUE:

an object of class "quad.tree" containing the following components:
data
the matrix x.
nrow
the number of rows in data.
ncol
the number of columns in data.
bucket.size
the bucket size used in the quad tree search.
row.index
the row ordering of the quad tree.
col.index
the columns used to partition the data into a quad tree.
medians
the medians used in partitioning the data into a quad tree.

DETAILS:

A quad tree is a partitioning of the rows in a matrix which can subsequently be used to efficiently find the rows in the matrix closest (using a variety of metrics) to any given point. Quad trees (also called k-d trees) are thought to be efficient for finding nearest neighbors when the number of columns in the matrix is less than or equal to 10.

The partitioning algorithm proceeds as follows:

1. Set lower = 1 and upper = nrow (the index of the first and last row in the matrix x). In the following, only consider the rows of x from lower to upper. 2. Compute the range of each column of x over the range of observations from lower to upper. Set icol to the column number with the maximum range. 3. Find the median for column icol, and order the rows in the data matrix x over the range lower to upper such that the median evenly splits the rows. 4. If upper - lower = bucket.size, return. 5. Go to step 2 with lower unchanged and upper set to (upper+lower)/2 (the left child of the tree). 6. Go to step 2 with lower set to (upper+lower)/2 + 1, and upper unchanged (the right child of the tree).

REFERENCES:

Friedman, J., Bentley, J. L., and Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transaction on Mathematical Software 3, 209-226.

SEE ALSO:

.

EXAMPLES:

x <- matrix(runif(500),50,10) 
quad <- quad.tree(x) 
y <- cbind(sids$easting,sids$northing) 
sids.quad <- quad.tree(y) 
sids.quad <- quad.tree(y, bucket.size=5)