abbreviate(names, minlength = 4, use.classes = T, dot = F)
dot=TRUE
).
It is not guaranteed that the abbreviations are of length
minlength
; the
algorithm will increase
minlength
until it successfully produces
unique abbreviations.
TRUE
, some special character classes will be used to keep what are
thought to be more meaningful characters in the abbreviation.
See the discussion of the algorithm in the
DETAILS
section.
To see the effect, try the abbreviation of
state.name
as in the example
below, but with
use.classes=FALSE
.
"."
?
"names"
attribute containing the original
names
argument.
This attribute can make subscripting the result convenient
(see the second example).
The abbreviations are not dependent on the order of the
names
argument, except when the algorithm produces and has to resolve,
duplicate abbreviations.
THE ALGORITHM.
The abbreviation algorithm does not simply
truncate.
It has a threshold, according to which it will drop:
1) non-printing characters and white space,
2) lower case vowels,
3) lower case consonants and punctuation
and finally
4) upper case letters and special characters.
If
use.classes
is
FALSE
, there is only the distinction between white
space and other characters.
Each string is broken up into words, separated by white space.
For a given value of the threshold, eligible letters are dropped from
the end of each word, one more letter from each word on each iteration,
until the desired minimum length is reached.
At least one letter is kept from each word.
If the abbreviation is too long, the threshold is raised and the
process is repeated.
This algorithm may still not produce unique abbreviations. If it does
not, then
minlength
will be increased and the algorithm will be
applied again, but only to those names not distinguished by the
previous round.
The end result may be that some of the abbreviations will be longer
than the requested length, but as few of these as possible given
the algorithm.
(See the third example below.)
The method assumes you want identical names to produce
identical abbreviations.
The result of all this tends to be abbreviations not quite like anything
you've ever seen before,
but usually fairly intuitive when the input names are English text.
abbreviate(state.name[1:10]) # Alabama Alaska Arizona Arkansas California Colorado # "Albm" "Alsk" "Arzn" "Arkn" "Clfr" "Clrd" # Connecticut Delaware Florida Georgia # "Cnnc" "Dlwr" "Flrd" "Gerg" abbreviate(state.name, 2)["New Jersey"] # New Jersey # "NJ" ab2 <- abbreviate(state.name, 2) table(nchar(ab2)) # 2 3 4 # 32 15 3 ab2[nchar(ab2)==4] # Massachusetts Mississippi Missouri # "Mssc" "Msss" "Mssr"