cmdscale(d, k=2, eig=F, add=F)
dist
, or a full, symmetric
matrix. Data is assumed to be dissimilarities or relative distances.
TRUE
, return the eigenvalues computed by the algorithm.
They can be used as an aid in determining the appropriate
dimensionality of the solution.
TRUE
, compute the additive constant
(see component
ac
below).
points
when
eig
and
add
are both
FALSE
.
points
, plus
eig
and/or
ac
.
k
columns and as many rows as there were
objects whose distances were given in
d
. Row i gives the
coordinates in
k
-space of the i-th object.
k
eigenvalues, returned only
when the
eig
argument is
TRUE
.
d
to transform dissimilarities
(or relative distances) into absolute distances.
The Unidimensional Subspace procedure,
(Torgerson, 1958, p. 276) is used to determine the additive constant.
This is only returned if
add=TRUE
.
If
add=FALSE
, no constant is added.
The
cmdscale
function is an implementation of metric multidimensional
scaling, that is, the distances between points in the result are
as close as possible (in a certain sense) to the beginning distances subject
to being Euclidean distances in a
k
dimensional space.
The solution for
k+1
dimensions has the same first
k
columns in
points
(up to numerical error) as the solution for dimension
k
.
The additive constant is typically used when the "distances" in
d
are
subjective dissimilarities. The
ac
constant attempts to make the distances
conform to a Euclidean space with as small of dimension as possible.
The estimation of
ac
is done under the assumption that the Euclidean space
has only one dimension; an assumption that simplifies computation.
A more technical explanation is that the constant attempts to eliminate
negative eigenvalues of the doubly centered matrix of the squared distances.
There are various measures of the goodness of fit of a solution
in the literature.
Two of them are given in the function in the example section below, see
Mardia, Kent and Bibby (1979, p. 408).
Results are currently computed to single-precision accuracy only.
Multidimensional scaling is the process of representing,
in a small dimensional space, the distances
(or dissimilarities) of a group of objects.
It is somewhat similar to cluster analysis but returns points in space
rather than distinct groupings.
Some examples of its use are: anthropologists studying cultural differences
based on language, art, etc.; and marketing researchers assessing product
similarity.
The technique can be used to "serialize" data if the result is close to a
curve in two dimensions or a string in three. For example, archeologists
might try to place several cultures into a time order.
Many multivariate statistics books include a discussion of multidimensional
scaling. Below are some examples.
Johnson, R. A. and Wichern, D. W. (1982).
Applied Multivariate Statistical Analysis.
Prentice-Hall, Englewood Cliffs, New Jersey.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979).
Multivariate Analysis.
Academic Press, London.
Torgerson, W. S. (1958).
Theory and Methods of Scaling,
Wiley, New York.
x <- cmdscale(dist.x) #default 2-space coord1 <- x[,1]; coord2 <- x[,2] par( pty="s" ) #set up square plot r <- range(x) #get overall max, min plot(coord1, coord2, type="n", xlim=r, ylim=r) #set up plot # note units per inch same on x and y axes text(coord1, coord2, seq(coord1)) #plot integers # use brush to explore a 3-dimensional scaling dis.vote <- dist(votes.repub) vote.scale <- cmdscale(dis.vote, 4) brush(vote.scale, rowlab=state.abb) # below is a function that calculates two measures of stress # it is fairly slow for datasets of more than 50 or so. cmdscale.gof <- function(dis, k = 4) { amat <- -0.5 * (dist2full(dis))^2 # see dist help file bmat <- sweep(amat, 1, apply(amat, 1, mean)) bmat <- sweep(bmat, 2, apply(bmat, 2, mean)) eigs <- svd(bmat, 0, 0) gof1 <- 1 - (cumsum(abs(eigs$d[1:k]))/sum(abs(eigs$d))) gof2 <- 1 - (cumsum(eigs$d[1:k]^2)/sum(eigs$d^2)) list(gof1 = gof1, gof2 = gof2, eig = eigs$d) } vote.scale <- cmdscale(dist(votes.repub)) plot(vote.scale, type="n") text(vote.scale, state.abb)