Title: | Visualising Multiple Pairwise Variable Correlations and Other Scores |
---|---|
Description: | We provide a tidy data structure and visualisations for multiple or grouped variable correlations, general association measures scagnostics and other pairwise scores suitable for numerical, ordinal and nominal variables. Supported measures include distance correlation, maximal information, ace correlation, Kendall's tau, and polychoric correlation. |
Authors: | Amit Chinwan [aut], Catherine Hurley [aut, cre] |
Maintainer: | Catherine Hurley <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2025-01-28 10:29:48 UTC |
Source: | https://github.com/cbhurley/bullseye |
Calculates ace based transformations and correlation, handling missing values and factors.
ace_cor(x, y, handle.na = TRUE)
ace_cor(x, y, handle.na = TRUE)
x |
a numeric vector or factor |
y |
a numeric vector or factor |
handle.na |
If TRUE uses pairwise complete observations. |
result of acepack::ace
ace_cor(iris$Sepal.Length, iris$Species)
ace_cor(iris$Sepal.Length, iris$Species)
Converts a pairwise to a symmetric matrix. Uses the first entry for each (x,y) pair.
## S3 method for class 'pairwise' as.matrix(x, ...)
## S3 method for class 'pairwise' as.matrix(x, ...)
x |
An object of class pairwise |
... |
other arguments |
A symmetric matrix
Calculates the maximal correlation coefficient from alternating conditional expectations algorithm for every variable pair in a dataset.
pair_ace(d, handle.na = TRUE, ...)
pair_ace(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
If TRUE uses pairwise complete observations, otherwise NAs not handled. |
... |
other arguments |
The maximal correlation is calculated using alternating conditional expectations
algorithm which find the transformations of variables such that the squared correlation
is maximised. The ace
function from acepack
package is used for the
calculation.
A tibble of class pairwise
with a maximal correlation from the alternating conditional expectations
algorithm for every variable pair
Breiman, Leo, and Jerome H. Friedman. "Estimating optimal transformations for multiple regression and correlation." Journal of the American statistical Association 80.391 (1985): 580-598.
pair_ace(iris)
pair_ace(iris)
Calculates canonical correlation for every variable pair in a dataset.
pair_cancor(d, handle.na = TRUE, ...)
pair_cancor(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
If TRUE uses pairwise complete observations to calculate correlation coefficient,, otherwise NAs not handled. |
... |
other arguments |
A tibble of class pairwise
with canonical correlation for every numeric or factor or mixed variable pair
pair_cancor(iris)
pair_cancor(iris)
Calculates Pearson's Contingency coefficient for every factor variable pair in a dataset.
pair_chi(d, handle.na = TRUE, ...)
pair_chi(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
The Pearson's contingency coefficient is calculated using ContCoef
.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with calculated Pearson's contingency coefficient for every factor variable
pair, or NULL if there are not at least two factor variables
pair_chi(iris)
pair_chi(iris)
pairwise_scores
Gives a list specifying the function to be used for two numeric (nn) variables, two factors (ff), two ordinals (oo) and for a factor-numeric pair (fn).
pair_control( nn = "pair_cor", oo = "pair_polychor", ff = "pair_cancor", fn = "pair_cancor", nnargs = NULL, ooargs = NULL, ffargs = NULL, fnargs = NULL )
pair_control( nn = "pair_cor", oo = "pair_polychor", ff = "pair_cancor", fn = "pair_cancor", nnargs = NULL, ooargs = NULL, ffargs = NULL, fnargs = NULL )
nn |
function for numeric pairs of variables, should return object of class |
oo |
function for ordered factor pairs of variables, should return object of class |
ff |
function for factor pairs of variables (not ordered), should return object of class |
fn |
function for factor-numeric pairs of variables, should return object of class |
nnargs |
other arguments for the nn function |
ooargs |
other arguments for the oo function |
ffargs |
other arguments for the ff function |
fnargs |
other arguments for the fn function |
list
Calculates one of either pearson, spearman or kendall correlation for every numeric variable pair in a dataset.
pair_cor(d, method = "pearson", handle.na = TRUE, ...)
pair_cor(d, method = "pearson", handle.na = TRUE, ...)
d |
A dataframe |
method |
A character string for the correlation coefficient to be calculated. Either "pearson" (default), "spearman", or "kendall". If the value is "all", then all three correlations are calculated. |
handle.na |
If TRUE uses pairwise complete observations to calculate correlation coefficient, otherwise NAs not handled. |
... |
other arguments |
A tibble of class pairwise
with calculated association value for every numeric variable pair,
or NULL if there are not at least two numeric variables
See pair_methods
for other score options.
pair_cor(iris) pair_cor(iris, method="kendall") pair_cor(iris, method="spearman") pair_cor(iris, method="all")
pair_cor(iris) pair_cor(iris, method="kendall") pair_cor(iris, method="spearman") pair_cor(iris, method="all")
Calculates distance correlation for every numeric variable pair in a dataset.
pair_dcor(d, handle.na = TRUE, ...)
pair_dcor(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
If TRUE uses pairwise complete observations to calculate distance correlation, otherwise NAs not handled. |
... |
other arguments |
The distance correlation is calculated using dcor2d
from energy
package
A tibble of class pairwise
with distance correlation for every numeric variable pair,
or NULL if there are not at least two numeric variables
pair_dcor(iris)
pair_dcor(iris)
Calculates Goodman Kruskal's Gamma coefficient for every factor variable pair in a dataset.
pair_gkGamma(d, handle.na = TRUE, ...)
pair_gkGamma(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
The Goodman Kruskal's Gamma coefficient is calculated using GoodmanKruskalGamma
function from the DescTools
package. Assumes factor levels are in the given order.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with factor variable pairs and Goodman Kruskal's Gamma coefficient,
or NULL if there are not at least two factor variables
pair_gkGamma(iris)
pair_gkGamma(iris)
Calculates Goodman Kruskal's Tau coefficient for every factor variable pair in a dataset.
pair_gkTau(d, handle.na = TRUE, ...)
pair_gkTau(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
The Goodman Kruskal's Tau coefficient is calculated using GoodmanKruskalTau
function from the DescTools
package. Assumes factor levels are in the given order.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with Goodman Kruskal's Tau for every factor variable pair,
or NULL if there are not at least two factor variables
pair_gkTau(iris)
pair_gkTau(iris)
A tibble of score functions along with the types of variable pairs these functions can be applied to. It also contains information regarding the packages used to calculate scores and the range of the values calculated.
pair_methods
pair_methods
An object of class tbl_df
(inherits from tbl
, data.frame
) with 17 rows and 7 columns.
tibble
pair_methods
pair_methods
Calculates MINE family values for every numeric variable pair in a dataset.
pair_mine(d, method = "MIC", handle.na = TRUE, ...)
pair_mine(d, method = "MIC", handle.na = TRUE, ...)
d |
A dataframe |
method |
character vector for the MINE value to be calculated. Subset of "MIC","MAS","MEV","MCN","MICR2", "GMIC", "TIC" |
handle.na |
If TRUE uses pairwise complete observations to calculate score, otherwise NAs not handled. |
... |
other arguments |
The values are calculated using mine
from minerva
A tibble of class pairwise
with scores for numeric variable pairs,
or NULL if there are not at least two numeric variables
Reshef, David N., et al. "Detecting novel associations in large data sets." science 334.6062 (2011): 1518-1524
pair_mine(iris) pair_mine(iris, method="MAS")
pair_mine(iris) pair_mine(iris, method="MAS")
Calculates normalized mutual information for every numeric or factor or mixed variable pair in a dataset.
pair_nmi(d, handle.na = TRUE, ...)
pair_nmi(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
If TRUE uses pairwise complete observations to calculate normalized mutual information, otherwise NAs not handled. |
... |
other arguments |
The normalized mutual information is calculated using maxNMI
from linkpotter package
A tibble of class pairwise
if (requireNamespace("linkspotter", quietly = TRUE)) { pair_nmi(iris) }
if (requireNamespace("linkspotter", quietly = TRUE)) { pair_nmi(iris) }
Calculates Polychoric correlation using for every factor variable pair in a dataset.
pair_polychor(d, handle.na = TRUE, ...)
pair_polychor(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
The polychoric correlation is calculated using the polychor
function from the
polycor
package, and assumes factor levels are in the given order. NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with polychoric correlation for factor pairs,
or NULL if there are not at least two factor variables
pair_polychor(iris)
pair_polychor(iris)
Calculates Polyserial correlation using for every factor-numeric variable pair in a dataset.
pair_polyserial(d, handle.na = TRUE, ...)
pair_polyserial(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
The polyserial correlation is calculated using the polyserial
function from the
polycor
package, and assumes factor levels are in the given order. NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with polyserial correlation for factor-numeric pairs,
or NULL if there are not at least one such pair.
pair_polyserial(iris)
pair_polyserial(iris)
Calculates scagnostic values for every numeric variable pair in a dataset.
pair_scagnostics( d, scagnostic = c("Outlying", "Skewed", "Clumpy", "Sparse", "Striated", "Convex", "Skinny", "Stringy", "Monotonic"), handle.na = TRUE, ... )
pair_scagnostics( d, scagnostic = c("Outlying", "Skewed", "Clumpy", "Sparse", "Striated", "Convex", "Skinny", "Stringy", "Monotonic"), handle.na = TRUE, ... )
d |
A dataframe |
scagnostic |
a character vector for the scagnostic to be calculated. Subset of "Outlying", "Stringy", "Striated", "Clumpy", "Sparse", "Skewed", "Convex", "Skinny" or "Monotonic" |
handle.na |
If TRUE uses pairwise complete observations. |
... |
other arguments |
The scagnostic values are calculated using scagnostics
function from the scagnostics
package.
A tibble of class pairwise
with scagnostic values for every numeric variable pair,
or NULL if there are not at least two numeric variables
Wilkinson, Leland, Anushka Anand, and Robert Grossman. "Graph-theoretic scagnostics." Information Visualization, IEEE Symposium on. IEEE Computer Society, 2005
pair_scagnostics(iris)
pair_scagnostics(iris)
Calculates Kendall's tau A for every factor variable pair in a dataset.
pair_tauA(d, handle.na = TRUE, ...)
pair_tauA(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
Calculated using KendallTauA
. Assumes factor levels are in the given order.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with factor pairs, or NULL if there are not at least two factor variables
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauA(d)
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauA(d)
Calculates Kendall's tau B every factor variable pair in a dataset.
pair_tauB(d, handle.na = TRUE, ...)
pair_tauB(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
Calculated using KendallTauB
. Assumes factor levels are in the given order.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with factor pairs, or NULL if there are not at least two factor variables
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauB(d)
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauB(d)
Calculates Stuarts's tau C every factor variable pair in a dataset.
pair_tauC(d, handle.na = TRUE, ...)
pair_tauC(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
Calculated using StuartTauC
. Assumes factor levels are in the given order.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with factor pairs, or NULL if there are not at least two factor variables
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauC(d)
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauC(d)
Calculates Kendall's tau W every factor variable pair in a dataset.
pair_tauW(d, handle.na = TRUE, ...)
pair_tauW(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
Calculated using KendallW
. Assumes factor levels are in the given order.
NAs are automatically handled by pairwise omit.
A tibble of class pairwise
with factor pairs, or NULL if there are not at least two factor variables
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauW(d)
d <- data.frame(x=rnorm(20), y=factor(sample(3,20, replace=TRUE)), z=factor(sample(2,20, replace=TRUE))) pair_tauW(d)
Calculates uncertainty coefficient for every factor variable pair in a dataset.
pair_uncertainty(d, handle.na = TRUE, ...)
pair_uncertainty(d, handle.na = TRUE, ...)
d |
A dataframe |
handle.na |
ignored. Pairwise complete observations are used automatically. |
... |
other arguments |
The Uncertainty coefficient is calculated using UncertCoef
function from the
DescTools
package.
A tibble of class pairwise
with every factor variable pair and uncertainty coefficient value,
or NULL if there are not at least two factor variables
pair_uncertainty(iris)
pair_uncertainty(iris)
Creates a data structure for every variable pair in a dataset.
pairwise(x, score = NA_character_, pair_type = NA_character_) ## S3 method for class 'matrix' pairwise(x, score = NA_character_, pair_type = NA_character_) ## S3 method for class 'data.frame' pairwise(x, score = NA_character_, pair_type = NA_character_) ## S3 method for class 'easycorrelation' pairwise(x, score = NA_character_, pair_type = NA_character_) as.pairwise(x, score = NA_character_, pair_type = NA_character_)
pairwise(x, score = NA_character_, pair_type = NA_character_) ## S3 method for class 'matrix' pairwise(x, score = NA_character_, pair_type = NA_character_) ## S3 method for class 'data.frame' pairwise(x, score = NA_character_, pair_type = NA_character_) ## S3 method for class 'easycorrelation' pairwise(x, score = NA_character_, pair_type = NA_character_) as.pairwise(x, score = NA_character_, pair_type = NA_character_)
x |
A dataframe or symmetric matrix. |
score |
a character string indicating the value of association, either "nn", "fn", "ff". |
pair_type |
a character string specifying the type of variable pair. |
A tbl_df of class pairwise
for pairs of variables with a column value
for the score value,
score
for a type of association value and pair_type
for the type of variable pair.
pairwise(matrix)
: pairwise method
pairwise(data.frame)
: pairwise method
pairwise(easycorrelation)
: pairwise method
as.pairwise()
: Same as pairwise
pairwise(cor(iris[,1:4]), score="pearson") pairwise(iris)
pairwise(cor(iris[,1:4]), score="pearson") pairwise(iris)
Constructs a pairwise result for each level of a by variable.
pairwise_by(d, by, pair_fun, ungrouped = TRUE)
pairwise_by(d, by, pair_fun, ungrouped = TRUE)
d |
a dataframe |
by |
a character string for the name of the conditioning variable. |
pair_fun |
A function returning a |
ungrouped |
If TRUE calculates the ungrouped score in addition to grouped scores. |
tibble of class "pairwise"
pairwise_by(iris, by="Species", pair_cor)
pairwise_by(iris, by="Species", pair_cor)
Calculates multiple scores for every variable pair in a dataset.
pairwise_multi( d, scores = c("pair_cor", "pair_dcor", "pair_mine", "pair_ace", "pair_cancor", "pair_nmi", "pair_uncertainty", "pair_chi"), handle.na = TRUE )
pairwise_multi( d, scores = c("pair_cor", "pair_dcor", "pair_mine", "pair_ace", "pair_cancor", "pair_nmi", "pair_uncertainty", "pair_chi"), handle.na = TRUE )
d |
dataframe |
scores |
a vector naming functions returning a |
handle.na |
If TRUE uses pairwise complete observations to calculate pairwise score, otherwise NAs not handled. |
tibble of class "pairwise"
iris1 <- iris iris1$Sepal.Length <- cut(iris1$Sepal.Length,3) pairwise_multi(iris1)
iris1 <- iris iris1$Sepal.Length <- cut(iris1$Sepal.Length,3) pairwise_multi(iris1)
Calculates scores for every variable pair in a dataset when by
is NULL
. If by
is a name of a variable in the dataset, conditional scores for every
variable pair at different levels of the grouping variable are calculated.
pairwise_scores( d, by = NULL, ungrouped = TRUE, control = pair_control(), handle.na = TRUE )
pairwise_scores( d, by = NULL, ungrouped = TRUE, control = pair_control(), handle.na = TRUE )
d |
a dataframe |
by |
a character string for the name of the conditioning variable. Set to |
ungrouped |
Ignored if |
control |
a list for the measures to be calculated for different variable types. The default is
|
handle.na |
If TRUE uses pairwise complete observations to calculate measure of association. |
Returns a pairwise
tibble structure.
A tibble with class pairwise
.
irisc <- pairwise_scores(iris) irisc <- pairwise_scores(iris, control=pair_control(nnargs= c(method="spearman"))) irisc <- pairwise_scores(iris, control=pair_control(fn="pair_ace")) #Lots of numerical measures irisc <- pairwise_scores(iris, control=pair_control(nn="pairwise_multi", fn=NULL)) irisc <- pairwise_scores(iris, control=pair_control(nn="pairwise_multi", nnargs="pair_cor", fn=NULL)) #conditional measures cond_iris <- pairwise_scores(iris, by = "Species") cond_iris_wo <- pairwise_scores(iris, by = "Species",ungrouped=FALSE) # without overall irisc <- pairwise_scores(iris, control=pair_control(nn="pairwise_multi", fn=NULL)) irisc <- pairwise_scores(iris, by = "Species",control=pair_control(nn="pairwise_multi", fn=NULL)) #scagnostics sc <- pairwise_scores(iris, control=pair_control(nn="pair_scagnostics", fn=NULL)) # ignore fn pairs sc <- pairwise_scores(iris, by = "Species", control=pair_control(nn="pair_scagnostics", fn=NULL)) # ignore fn pairs
irisc <- pairwise_scores(iris) irisc <- pairwise_scores(iris, control=pair_control(nnargs= c(method="spearman"))) irisc <- pairwise_scores(iris, control=pair_control(fn="pair_ace")) #Lots of numerical measures irisc <- pairwise_scores(iris, control=pair_control(nn="pairwise_multi", fn=NULL)) irisc <- pairwise_scores(iris, control=pair_control(nn="pairwise_multi", nnargs="pair_cor", fn=NULL)) #conditional measures cond_iris <- pairwise_scores(iris, by = "Species") cond_iris_wo <- pairwise_scores(iris, by = "Species",ungrouped=FALSE) # without overall irisc <- pairwise_scores(iris, control=pair_control(nn="pairwise_multi", fn=NULL)) irisc <- pairwise_scores(iris, by = "Species",control=pair_control(nn="pairwise_multi", fn=NULL)) #scagnostics sc <- pairwise_scores(iris, control=pair_control(nn="pair_scagnostics", fn=NULL)) # ignore fn pairs sc <- pairwise_scores(iris, by = "Species", control=pair_control(nn="pair_scagnostics", fn=NULL)) # ignore fn pairs
Plots multiple pairwise variable scores in a matrix layout.
plot_pairwise( scores, var_order = "seriate_max", score_limits = NULL, inner_width = 0.5, center_level = "all", na.value = "grey80", interactive = FALSE )
plot_pairwise( scores, var_order = "seriate_max", score_limits = NULL, inner_width = 0.5, center_level = "all", na.value = "grey80", interactive = FALSE )
scores |
The scores for the matrix plot. Either of class |
var_order |
The variable order to be used. The default NULL means variables in are ordered alphabetically. A value of "seriate_max" means variables are re-ordered to emphasize pairs with maximum abolute scores. A value of "seriate_max_diff" means variables are re-ordered to emphasize pairs with maximum score differences. Otherwise Var_order must be a subset of variables in scores. |
score_limits |
a numeric vector of length specifying the limits of the scale. |
inner_width |
A number between 0 and 1 specifying radius of the inner bullseye. |
center_level |
Specifies which level of group goes into the innter bullseye. Defaults to "all". |
na.value |
used for scores with a value of NA |
interactive |
defaults to FALSE |
A girafe
object if interactive==TRUE, otherwise a ggplot2
.
If scores has one value for x,y pair, then a filled circle is drawn with fill representing the score value. If there are multiple values for each x,y pair then the filled circle is split into wedges, with the wedge fill representing the values.
If some rows have group=center_level
, then the glyph is drawn as a bullseye.
plot_pairwise(pair_cor(iris)) plot_pairwise(pairwise_scores(iris,by="Species"))
plot_pairwise(pair_cor(iris)) plot_pairwise(pairwise_scores(iris,by="Species"))
Plots the calculated measures of association among different variable pairs for a dataset in a linear layout.
plot_pairwise_linear( scores, pair_order = "seriate_max", geom = c("tile", "point"), add_lines = FALSE, score_limits = NULL, na.value = "grey80", interactive = FALSE )
plot_pairwise_linear( scores, pair_order = "seriate_max", geom = c("tile", "point"), add_lines = FALSE, score_limits = NULL, na.value = "grey80", interactive = FALSE )
scores |
A tibble with the calculated association measures for the matrix plot.
Either of class |
pair_order |
The variable pair order to be used. The default NULL means pairs are in order of their first appearance in |
geom |
The geom to be used. Should be "point" or "tile". |
add_lines |
When geom= "point" is used, should the points be connected by lines? Defaults to FALSE. |
score_limits |
a numeric vector of length specifying the limits of the scale. |
na.value |
used for geom_tile with a value of NA |
interactive |
defaults to FALSE |
A girafe
object if interactive==TRUE, otherwise a ggplot2
.
plot_pairwise_linear(pairwise_scores(iris)) plot_pairwise_linear(pairwise_scores(iris,by="Species")) plot_pairwise_linear(pairwise_multi(iris), geom="point")
plot_pairwise_linear(pairwise_scores(iris)) plot_pairwise_linear(pairwise_scores(iris,by="Species")) plot_pairwise_linear(pairwise_multi(iris), geom="point")
pairwise
.Plot method for class pairwise
.
## S3 method for class 'pairwise' plot(x, type = c("matrix", "linear"), ...)
## S3 method for class 'pairwise' plot(x, type = c("matrix", "linear"), ...)
x |
An object of class |
type |
If "matrix", calls |
... |
further arguments to |
a plot
plot(pairwise_scores(iris))
plot(pairwise_scores(iris))