This function requires the bigdata library section to be loaded.
bd.cor(data, x.columns=NULL, y.columns=NULL, cov=F)
bdFrame
or
data.frame
.
data
that
determines the rows in the output.
The correlation (or covariance) will be computed between the
columns of
data
specified in
x.columns
and in
y.columns
.
If missing, all numeric columns of
data
will be used.
data
that
determines the columns in the output.
If missing, all numeric columns of
data
will be used.
FALSE
(the default) correlations are computed,
if
TRUE
covariances are computed.
"bdFrame"
or
"data.frame"
,
(the same class as the input
data
)
containing the correlations or covariances for the variables specified.
The first column in the output contains the names of the target columns.
The covariance of two variables, X and Y, is the average value of the product of the deviation of X from its mean and the deviation of Y from its mean. The variables are positively associated if, when X is larger than its mean, Y tends to be larger than its mean as well (or, when X is smaller than its mean, Y tends to be smaller than its mean as well). In this case, the covariance is a positive number. The variables are negatively associated if, when X is larger than its mean, Y tends to be smaller than its mean (or vice versa). Here, the covariance is a negative number. The scale of the covariance depends on the scale of the data values in X and Y; it is possible to have very large or very small covariance values.
The correlation of two variables is a dimensionless measure of association
based on the covariance; it is the covariance divided by the product of
the standard deviations for the two variables. Correlation is always in the
range
Correlation measures the strength of the linear relationship between two variables. If you create a scatter plot for two variables that have correlation near 1, the points will appear as a line with positive slope. Likewise, if you create a scatter plot for two variables that have correlation near -1, you will see points along a line with negative slope.
A correlation near zero implies that two variables do not have a linear relationship. However, this does not necessarily mean the variables are completely unrelated. It is possible, for example, that the variables are related quadratically or cubically, associations which are not detected by the correlation measure.
# Compute correlations of numeric variables in fuel.frame with the # variable Fuel: bd.cor(fuel.frame, "Fuel") # Compute correlations only between Fuel and Disp. bd.cor(fuel.frame, "Fuel", "Disp.") # Compute covariance of numeric variables in fuel.frame with the # variable Fuel: bd.cor(fuel.frame, "Fuel", cov=T)