Title: | Interactive Conditional Visualization for Supervised and Unsupervised Models in Shiny |
---|---|
Description: | Constructs a shiny app function with interactive displays for conditional visualization of models, data and density functions. An extended version of package 'condvis'. Catherine B. Hurley, Mark O'Connell,Katarina Domijan (2021) <10.1080/10618600.2021.1983439>. |
Authors: | Catherine Hurley [aut, cre], Mark OConnell [aut], Katarina Domijan [aut] |
Maintainer: | Catherine Hurley <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.1.2 |
Built: | 2025-01-30 03:08:35 UTC |
Source: | https://github.com/cbhurley/condvis2 |
Constructs tours of data space based on centers of clusters
kmeansPath(data, length = 10, reorder = TRUE, conditionvars = NULL, ...) pamPath( data, length = 10, reorder = TRUE, conditionvars = NULL, maxn = 4000, ... ) claraPath(data, length = 10, reorder = TRUE, conditionvars = NULL, ...) medoidPath(data, cl, reorder = FALSE) centroidPath(data, cl, reorder = FALSE)
kmeansPath(data, length = 10, reorder = TRUE, conditionvars = NULL, ...) pamPath( data, length = 10, reorder = TRUE, conditionvars = NULL, maxn = 4000, ... ) claraPath(data, length = 10, reorder = TRUE, conditionvars = NULL, ...) medoidPath(data, cl, reorder = FALSE) centroidPath(data, cl, reorder = FALSE)
data |
A dataframe |
length |
Path length, defaults to 10 |
reorder |
If TRUE uses DendSer to reorder the path |
conditionvars |
A vector of variable names. The returned tour is for this subset of variables. |
... |
ignored |
maxn |
(pamPath only) For datasets with more than maxn rows, use maxn randomly selected rows. |
cl |
A vector specifying cluster membership for rows of data. |
A dataframe with the path
kmeansPath()
: Constructs a tour of data space following length k-means centroids
pamPath()
: Constructs a tour of data space following length pam
medoids
claraPath()
: Constructs a tour of data space following length clara
medoids
medoidPath()
: Returns a path visiting cluster medoids
centroidPath()
: Returns a path visiting cluster centroids
kmeansPath(mtcars,length=4) pamPath(mtcars,length=4) claraPath(mtcars,length=4) medoidPath(mtcars,cl=rep(1:3, length.out=nrow(mtcars)))
kmeansPath(mtcars,length=4) pamPath(mtcars,length=4) claraPath(mtcars,length=4) medoidPath(mtcars,cl=rep(1:3, length.out=nrow(mtcars)))
Plots a conditionPlot, showing one, two or many predictors. The predictor setting in varVal
is drawn in magenta.
conditionPlot( CVdata, var, varVal, pointColor = "steelblue", sim = NULL, resetpar = TRUE, plotrows = NULL )
conditionPlot( CVdata, var, varVal, pointColor = "steelblue", sim = NULL, resetpar = TRUE, plotrows = NULL )
CVdata |
the dataset used for the fit |
var |
one more condition vars. Draws a parallel coordinate plot for more than two. |
varVal |
the current setting of the conditionvars, shown in magenta. |
pointColor |
a color, vector of colors,or the name of variable to be used for coloring |
sim |
If non-NULL should be a vector of similarity weights. |
resetpar |
For use withing shiny app. |
plotrows |
If non-NULL should be a vector of case indices |
conditionPlot(mtcars, c("wt","hp"), c("wt"=3, "hp"=200), pointColor="am") conditionPlot(mtcars, c("wt","hp"), mtcars[1,], pointColor="am") #Calculate similarity using wt, hp observations from first case sim <- similarityweight(mtcars[1, c("wt","hp")], mtcars[, c("wt","hp")], threshold=1) # Marks points with black border with positive sim values. These are points within 1 (threshold) sd #of pink cross. conditionPlot(mtcars, c("wt","hp"), mtcars[1,], pointColor="am", sim=sim) sim <- similarityweight(mtcars[1, ], mtcars, threshold=2) conditionPlot(mtcars, names(mtcars), mtcars[1,], sim=sim)
conditionPlot(mtcars, c("wt","hp"), c("wt"=3, "hp"=200), pointColor="am") conditionPlot(mtcars, c("wt","hp"), mtcars[1,], pointColor="am") #Calculate similarity using wt, hp observations from first case sim <- similarityweight(mtcars[1, c("wt","hp")], mtcars[, c("wt","hp")], threshold=1) # Marks points with black border with positive sim values. These are points within 1 (threshold) sd #of pink cross. conditionPlot(mtcars, c("wt","hp"), mtcars[1,], pointColor="am", sim=sim) sim <- similarityweight(mtcars[1, ], mtcars, threshold=2) conditionPlot(mtcars, names(mtcars), mtcars[1,], sim=sim)
Creates Condvis Shiny app
condvis( data, model = NULL, response = NULL, sectionvars = NULL, conditionvars = NULL, predsInit = NULL, pointColor = c("steelblue", "grey0"), cPlotPCP = FALSE, cPlotn = 1000, orderConditionVars = "default", threshold = 1, thresholdmax = NULL, linecols = NULL, showsim = NULL, theta3d = 45, phi3d = 20, dataplot = "pcp", tours = NULL, predictArgs = NULL, xlim = NULL, ylim = NULL, zlim = NULL, density = FALSE, showdata = density == FALSE, displayHeight = 950 )
condvis( data, model = NULL, response = NULL, sectionvars = NULL, conditionvars = NULL, predsInit = NULL, pointColor = c("steelblue", "grey0"), cPlotPCP = FALSE, cPlotn = 1000, orderConditionVars = "default", threshold = 1, thresholdmax = NULL, linecols = NULL, showsim = NULL, theta3d = 45, phi3d = 20, dataplot = "pcp", tours = NULL, predictArgs = NULL, xlim = NULL, ylim = NULL, zlim = NULL, density = FALSE, showdata = density == FALSE, displayHeight = 950 )
data |
the dataset used for the fit. Should not have NAs for response, sectionvars or conditionvars. |
model |
A fitted model or list of models. May be NULL. |
response |
name of response variable. If null, tries to extract from model. |
sectionvars |
names of sectionvars. If null, extracts from data. |
conditionvars |
names of condition vars. If null, extracts from data. |
predsInit |
Optionally provide starting value for some predictors. Defaults to medoid. |
pointColor |
a color, or the name of variable to be used for coloring. If the named variable is numeric, it is first converted to a factor with 3 levels. |
cPlotPCP |
if TRUE, conditionplots are drawn as a single PCP (for more than two conditionvars) |
cPlotn |
Defaults to 1000. Shows a sample of this number of points in conditionplots. Non-numeric values are ignored. |
orderConditionVars |
If supplied, a function to order the Condition Vars |
threshold |
used for similarity weights, defaults to 1. |
thresholdmax |
maximum value allowed of threshold. |
linecols |
vector of colors to be used for fits |
showsim |
if TRUE, shows sim in conditionplots with points/lines. Defaults to TRUE with 150 or fewer cases. |
theta3d , phi3d
|
Angles defining the viewing direction for 3d surface. |
dataplot |
"pcp" or "pairs". Used when there is no response, or more than two sectionvars. |
tours |
A list of pre-calculated tours |
predictArgs |
a list with one entry per fit, giving arguments for CVpredict |
xlim |
passed on to sectionplot |
ylim |
passed on to sectionplot |
zlim |
passed on to sectionplot |
density |
default FALSE. Use TRUE if model is a density function. |
showdata |
defaults to |
displayHeight |
supply a value for the display height |
fit <- lm(mpg ~ wt+hp+am, data=mtcars) if(interactive()){ condvis(mtcars,fit, response="mpg",sectionvars="wt", conditionvars=c("am", "hp"), pointColor ="red") }
fit <- lm(mpg ~ wt+hp+am, data=mtcars) if(interactive()){ condvis(mtcars,fit, response="mpg",sectionvars="wt", conditionvars=c("am", "hp"), pointColor ="red") }
A predict generic function for condvis
CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 ) ## Default S3 method: CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, pinterval = NULL, pinterval_level = 0.95, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'lm' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, pinterval = NULL, pinterval_level = 0.95, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'glm' CVpredict( fit, ..., type = "response", ptype = "pred", pthreshold = NULL, pinterval = NULL, pinterval_level = 0.95, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'lda' CVpredict( fit, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'qda' CVpredict( fit, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'nnet' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'randomForest' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'ranger' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'rpart' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'tree' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'C5.0' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'svm' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'gbm' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, n.trees = fit$n.trees, ptrans = NULL ) ## S3 method for class 'loess' CVpredict(fit, newdata = NULL, ...) ## S3 method for class 'ksvm' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'glmnet' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, s = NULL, makex = NULL ) ## S3 method for class 'cv.glmnet' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, makex = NULL ) ## S3 method for class 'glmnet.formula' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, s = NULL ) ## S3 method for class 'cv.glmnet.formula' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'keras.engine.training.Model' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, batch_size = 32, response = NULL, predictors = NULL ) ## S3 method for class 'kde' CVpredict(fit, newdata = fit$x, ..., scale = TRUE) ## S3 method for class 'densityMclust' CVpredict( fit, newdata = NULL, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, scale = TRUE ) ## S3 method for class 'MclustDA' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'MclustDR' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'Mclust' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'train' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'bartMachine' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'wbart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'lbart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'pbart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'bart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'model_fit' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 ) ## S3 method for class 'WrappedModel' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 ) ## S3 method for class 'Learner' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 )
CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 ) ## Default S3 method: CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, pinterval = NULL, pinterval_level = 0.95, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'lm' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, pinterval = NULL, pinterval_level = 0.95, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'glm' CVpredict( fit, ..., type = "response", ptype = "pred", pthreshold = NULL, pinterval = NULL, pinterval_level = 0.95, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'lda' CVpredict( fit, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'qda' CVpredict( fit, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'nnet' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'randomForest' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'ranger' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'rpart' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'tree' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'C5.0' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'svm' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'gbm' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, n.trees = fit$n.trees, ptrans = NULL ) ## S3 method for class 'loess' CVpredict(fit, newdata = NULL, ...) ## S3 method for class 'ksvm' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'glmnet' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, s = NULL, makex = NULL ) ## S3 method for class 'cv.glmnet' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, makex = NULL ) ## S3 method for class 'glmnet.formula' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, s = NULL ) ## S3 method for class 'cv.glmnet.formula' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'keras.engine.training.Model' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, batch_size = 32, response = NULL, predictors = NULL ) ## S3 method for class 'kde' CVpredict(fit, newdata = fit$x, ..., scale = TRUE) ## S3 method for class 'densityMclust' CVpredict( fit, newdata = NULL, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, scale = TRUE ) ## S3 method for class 'MclustDA' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'MclustDR' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'Mclust' CVpredict( fit, newdata, ..., ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'train' CVpredict( fit, newdata, ..., type = "response", ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'bartMachine' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'wbart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'lbart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'pbart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'bart' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL ) ## S3 method for class 'model_fit' CVpredict( fit, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 ) ## S3 method for class 'WrappedModel' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 ) ## S3 method for class 'Learner' CVpredict( fit, newdata, ..., type = NULL, ptype = "pred", pthreshold = NULL, ylevels = NULL, ptrans = NULL, pinterval = NULL, pinterval_level = 0.95 )
fit |
A fitted model |
newdata |
Where to calculate predictions. |
... |
extra arguments to predict |
ptype |
One of "pred","prob" or "probmatrix" |
pthreshold |
Used for calculating classes from probs, in the two class case |
ylevels |
The levels of the response, when it is a factor |
ptrans |
A function to apply to the result |
pinterval |
NULL, "confidence" or "prediction". Only for lm, parsnip, mlr(regression, confidence only) |
pinterval_level |
Defaults to 0.95 |
type |
For some predict methods |
n.trees |
Used by CVpredict.gbm, passed to predict |
s |
Used by CVpredict.glmnet and CVpredict.cv.glmnet, passed to predict |
makex |
Used by CVpredict.glmnet and CVpredict.cv.glmnet. A function to construct xmatrix for predict. |
batch_size |
Used by CVpredict.keras.engine.training.Model, passed to predict |
response |
Used by CVpredict.keras.engine.training.Model. Name of response (optional) |
predictors |
Used by CVpredict.keras.engine.training.Model. Name of predictors |
scale |
Used by CVpredict for densities. If TRUE (default) rescales the conditional density to integrate to 1. |
This is a wrapper for predict used by condvis. When the model response is numeric, the result is a vector of predictions. When the model response is a factor the result depends on the value of ptype. If ptype="pred", the result is a factor. If also threshold is numeric, it is used to threshold a numeric prediction to construct the factor when the factor has two levels. For ptype="prob", the result is a vector of probabilities for the last factor level. For ptype="probmatrix", the result is a matrix of probabilities for each factor level.
a vector of predictions, or a matrix when type is "probmatrix"
CVpredict(default)
: CVpredict method
CVpredict(lm)
: CVpredict method
CVpredict(glm)
: CVpredict method
CVpredict(lda)
: CVpredict method
CVpredict(qda)
: CVpredict method
CVpredict(nnet)
: CVpredict method
CVpredict(randomForest)
: CVpredict method
CVpredict(ranger)
: CVpredict method
CVpredict(rpart)
: CVpredict method
CVpredict(tree)
: CVpredict method
CVpredict(C5.0)
: CVpredict method
CVpredict(svm)
: CVpredict method
CVpredict(gbm)
: CVpredict method
CVpredict(loess)
: CVpredict method
CVpredict(ksvm)
: CVpredict method
CVpredict(glmnet)
: CVpredict method
CVpredict(cv.glmnet)
: CVpredict method
CVpredict(glmnet.formula)
: CVpredict method
CVpredict(cv.glmnet.formula)
: CVpredict method
CVpredict(keras.engine.training.Model)
: CVpredict method
CVpredict(kde)
: CVpredict method
CVpredict(densityMclust)
: CVpredict method
CVpredict(MclustDA)
: CVpredict method
CVpredict(MclustDR)
: CVpredict method
CVpredict(Mclust)
: CVpredict method
CVpredict(train)
: CVpredict method for caret
CVpredict(bartMachine)
: CVpredict method
CVpredict(wbart)
: CVpredict method
CVpredict(lbart)
: CVpredict method
CVpredict(pbart)
: CVpredict method
CVpredict(bart)
: CVpredict method
CVpredict(model_fit)
: CVpredict method for parsnip
CVpredict(WrappedModel)
: CVpredict method for mlr
CVpredict(Learner)
: CVpredict method for mlr3
#Fit a model. f <- lm(Fertility~ ., data=swiss) CVpredict(f) #Fit a model with a factor response swiss1 <- swiss swiss1$Fertility <- cut(swiss$Fertility, c(0,80,100)) levels(swiss1$Fertility)<- c("lo", "hi") f <- glm(Fertility~ ., data=swiss1, family="binomial") CVpredict(f) # by default gives a factor CVpredict(f, ptype="prob") # gives prob of level hi CVpredict(f, ptype="probmatrix") # gives prob of both levels
#Fit a model. f <- lm(Fertility~ ., data=swiss) CVpredict(f) #Fit a model with a factor response swiss1 <- swiss swiss1$Fertility <- cut(swiss$Fertility, c(0,80,100)) levels(swiss1$Fertility)<- c("lo", "hi") f <- glm(Fertility~ ., data=swiss1, family="binomial") CVpredict(f) # by default gives a factor CVpredict(f, ptype="prob") # gives prob of level hi CVpredict(f, ptype="probmatrix") # gives prob of both levels
Constructs tours of data space based on fits.
lofPath( data, fits, length = 10, reorder = TRUE, conditionvars = NULL, predictArgs = NULL, response = NULL, ... ) diffitsPath( data, fits, length = 10, reorder = TRUE, conditionvars = NULL, predictArgs = NULL, ... ) hiresponsePath( data, response = NULL, length = 10, reorder = TRUE, conditionvars = NULL, ... ) loresponsePath( data, response = NULL, length = 10, reorder = TRUE, conditionvars = NULL, ... )
lofPath( data, fits, length = 10, reorder = TRUE, conditionvars = NULL, predictArgs = NULL, response = NULL, ... ) diffitsPath( data, fits, length = 10, reorder = TRUE, conditionvars = NULL, predictArgs = NULL, ... ) hiresponsePath( data, response = NULL, length = 10, reorder = TRUE, conditionvars = NULL, ... ) loresponsePath( data, response = NULL, length = 10, reorder = TRUE, conditionvars = NULL, ... )
data |
A dataframe |
fits |
A model fit or list of fits |
length |
Path length, defaults to 10 |
reorder |
If TRUE (default) uses DendSer to reorder the path |
conditionvars |
A vector of variable names. The returned tour is for this subset of variables. |
predictArgs |
Extra inputs to CVpredict |
response |
The name of the response variable |
... |
ignored |
A dataframe with the path
lofPath()
: Constructs a tour of data space showing biggest absolute residuals from fits.
diffitsPath()
: Constructs a tour of data space showing biggest differences in fits.
hiresponsePath()
: Constructs a tour of data space showing high (numeric) response values
loresponsePath()
: Constructs a tour of data space showing low (numeric) response values
fit1 <- lm(mpg ~ wt+hp+am, data=mtcars) fit2 <- lm(mpg ~ wt, data=mtcars) lofPath(mtcars,fit1, response="mpg") diffitsPath(mtcars,list(fit1,fit2))
fit1 <- lm(mpg ~ wt+hp+am, data=mtcars) fit2 <- lm(mpg ~ wt, data=mtcars) lofPath(mtcars,fit1, response="mpg") diffitsPath(mtcars,list(fit1,fit2))
Finds medoid of data
medoid(data, maxn = 4000)
medoid(data, maxn = 4000)
data |
A dataframe |
maxn |
For datasets with more than maxn rows, use maxn randomly selected rows. |
A dataframe with one row, which is the medoid of the data, based on (standardised) daisy dist
medoid(mtcars)
medoid(mtcars)
Interpolation
pathInterpolate(x, ninterp = 4) ## Default S3 method: pathInterpolate(x, ninterp = 4L) ## S3 method for class 'factor' pathInterpolate(x, ninterp = 4L) ## S3 method for class 'data.frame' pathInterpolate(x, ninterp = 4L)
pathInterpolate(x, ninterp = 4) ## Default S3 method: pathInterpolate(x, ninterp = 4L) ## S3 method for class 'factor' pathInterpolate(x, ninterp = 4L) ## S3 method for class 'data.frame' pathInterpolate(x, ninterp = 4L)
x |
a numeric or factor vector or dataframe |
ninterp |
number of interpolated steps |
interpolated version of x
pathInterpolate(default)
: Default interpolate method
pathInterpolate(factor)
: pathInterpolate method for factor
pathInterpolate(data.frame)
: pathInterpolate method for data.frame
Plots diagnostics for the tour supplied
plotTourDiagnostics( path, data, pathlen = nrow(path), threshold = 1, which = 1:3, ... )
plotTourDiagnostics( path, data, pathlen = nrow(path), threshold = 1, which = 1:3, ... )
path |
the tour |
data |
the dataset |
pathlen |
the pathlength |
threshold |
used for similarityweight |
which |
subset of 1:3 |
... |
other args for similarityweight |
The first plot shows approximately how much data are visible on each section, the second shows what proportion of data are visited by the tour, and the third a density estimate of max similarity values.
Table of max sims attained.
The section plot relates a fit or fits to one or two predictors (sectionvar
),
for fixed values of other predictors in conditionvals
.
sectionPlot( CVdata, CVfit = NULL, response = NULL, preds, sectionvar, conditionvals, pointColor = "steelblue", sim = NULL, threshold = 1, linecols = NULL, dataplot = "pcp", gridsize = 50, probs = FALSE, view3d = FALSE, theta3d = 45, phi3d = 20, xlim = NULL, ylim = NULL, zlim = NULL, pointSize = 1.5, predictArgs = NULL, resetpar = TRUE, density = FALSE, showdata = density == FALSE, returnInfo = FALSE, pointColorFromResponse = FALSE, pcolInfo = NULL )
sectionPlot( CVdata, CVfit = NULL, response = NULL, preds, sectionvar, conditionvals, pointColor = "steelblue", sim = NULL, threshold = 1, linecols = NULL, dataplot = "pcp", gridsize = 50, probs = FALSE, view3d = FALSE, theta3d = 45, phi3d = 20, xlim = NULL, ylim = NULL, zlim = NULL, pointSize = 1.5, predictArgs = NULL, resetpar = TRUE, density = FALSE, showdata = density == FALSE, returnInfo = FALSE, pointColorFromResponse = FALSE, pcolInfo = NULL )
CVdata |
the dataset used for the fit |
CVfit |
a fit or list of fits |
response |
name of response variable |
preds |
names of predictors |
sectionvar |
section variable, or variables. |
conditionvals |
conditioning values. A vector/list or dataframe with one row |
pointColor |
a color, vector of colors,or the name of variable to be used for coloring |
sim |
vector of similarity weights |
threshold |
used for similarity weights, defaults to 1. |
linecols |
vector of line colours |
dataplot |
"pcp" or "pairs". Used when there is no response, or more than two sectionvars. |
gridsize |
used to construct grid of fitted values. |
probs |
Logical; if |
view3d |
Logical; if |
theta3d , phi3d
|
Angles defining the viewing direction. |
xlim |
passed on to plot |
ylim |
passed on to plot |
zlim |
passed on to plot |
pointSize |
used for points |
predictArgs |
a list with one entry per fit, giving arguments for predict |
resetpar |
When TRUE (the default) resets pars after drawing. |
density |
default FALSE. Use TRUE if model is a density function. |
showdata |
If FALSE, data on section not shown. |
returnInfo |
If TRUE, returns coordinates for some plots |
pointColorFromResponse |
ignore–For interactive use only |
pcolInfo |
ignore–For interactive use only |
The type of plot depends on the fit and the section variables. Observations with non zero values of the similarity weights
sim
are shown. If no fit is provided, the data are shown as a parallel coordinate plot or pairs
plot, depending on dataplot
.
The fit could also be a density estimate.
plotted coordinates, for some plots
#Fit a model. f <- lm(Fertility~ ., data=swiss) svar <- "Education" preds <- variable.names(f)[-1] sectionPlot(swiss,f, "Fertility",preds,svar, swiss[12,]) sectionPlot(swiss,f, "Fertility",preds,svar, apply(swiss,2,median)) sectionPlot(swiss,f, "Fertility",preds,preds[1:2], apply(swiss,2,median)) sectionPlot(swiss,f, "Fertility",preds,preds[1:2], apply(swiss,2,median), view3d=TRUE) # PCP of swiss data, showing only cases whose percent catholic and infant.mortality are # similar to those of the first case sectionPlot(swiss,preds=names(swiss), sectionvar= names(swiss)[1:4],conditionvals=swiss[1,] ) # Use dataplot="pairs" to switch to a pairs plot # A density estimate example ## Not run: library(ks) fde <-kde(iris[,1:3]) sectionPlot(iris,list(kde=fde), response=NULL, preds=names(iris)[1:3], sectionvar=names(iris)[1], conditionvals=iris[1,],density=TRUE) ## End(Not run)
#Fit a model. f <- lm(Fertility~ ., data=swiss) svar <- "Education" preds <- variable.names(f)[-1] sectionPlot(swiss,f, "Fertility",preds,svar, swiss[12,]) sectionPlot(swiss,f, "Fertility",preds,svar, apply(swiss,2,median)) sectionPlot(swiss,f, "Fertility",preds,preds[1:2], apply(swiss,2,median)) sectionPlot(swiss,f, "Fertility",preds,preds[1:2], apply(swiss,2,median), view3d=TRUE) # PCP of swiss data, showing only cases whose percent catholic and infant.mortality are # similar to those of the first case sectionPlot(swiss,preds=names(swiss), sectionvar= names(swiss)[1:4],conditionvals=swiss[1,] ) # Use dataplot="pairs" to switch to a pairs plot # A density estimate example ## Not run: library(ks) fde <-kde(iris[,1:3]) sectionPlot(iris,list(kde=fde), response=NULL, preds=names(iris)[1:3], sectionvar=names(iris)[1], conditionvals=iris[1,],density=TRUE) ## End(Not run)
Calculate the similarity weight for a set of observations, based on their distance from some arbitrary points in data space. Observations which are very similar to the point under consideration are given weight 1, while observations which are dissimilar to the point are given weight zero.
similarityweight( x, data, threshold = 1, distance = "euclidean", lambda = NULL, scale = TRUE )
similarityweight( x, data, threshold = 1, distance = "euclidean", lambda = NULL, scale = TRUE )
x |
A dataframe describing arbitrary points in the space of the data
(i.e., with same |
data |
A dataframe representing observed data. |
threshold |
Threshold distance outside which observations will be assigned similarity weight zero. This is numeric and should be > 0. Defaults to 1. |
distance |
The type of distance measure to be used, currently just three
types of Minkowski distance: |
lambda |
A constant to multiply by the number of categorical
mismatches, before adding to the Minkowski distance, to give a general
dissimilarity measure. If left |
scale |
defaults to TRUE, in which case numeric variables are scaled to unit sd. |
Similarity weight is assigned to observations based on their distance from a given point. The distance is calculated as Minkowski distance between the numeric elements for the observations whose categorical elements match, or else the Gower distance.
A numeric vector or matrix, with values from 0 to 1. The similarity
weights for the observations in data
arranged in rows for each row
in x
.
O'Connell M, Hurley CB and Domijan K (2017). “Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R.”Journal of Statistical Software, 81(5), pp. 1-20. <URL:http://dx.doi.org/10.18637/jss.v081.i05>.
## Say we want to find observations similar to the first observation. ## The first observation is identical to itself, so it gets weight 1. The ## second observation is similar, so it gets some weight. The rest are more ## different, and so get zero weight. data(mtcars) similarityweight(x = mtcars[1, ], data = mtcars) ## By increasing the threshold, we can find observations which are more ## approximately similar to the first row. Note that the second observation ## now has weight 1, so we lose some ability to discern how similar ## observations are by increasing the threshold. similarityweight(x = mtcars[1, ], data = mtcars, threshold = 5) ## Can provide a number of points to 'x'. Here we see that the Mazda RX4 Wag ## is more similar to the Merc 280 than the Mazda RX4 is. similarityweight(mtcars[1:2, ], mtcars, threshold = 3)
## Say we want to find observations similar to the first observation. ## The first observation is identical to itself, so it gets weight 1. The ## second observation is similar, so it gets some weight. The rest are more ## different, and so get zero weight. data(mtcars) similarityweight(x = mtcars[1, ], data = mtcars) ## By increasing the threshold, we can find observations which are more ## approximately similar to the first row. Note that the second observation ## now has weight 1, so we lose some ability to discern how similar ## observations are by increasing the threshold. similarityweight(x = mtcars[1, ], data = mtcars, threshold = 5) ## Can provide a number of points to 'x'. Here we see that the Mazda RX4 Wag ## is more similar to the Merc 280 than the Mazda RX4 is. similarityweight(mtcars[1:2, ], mtcars, threshold = 3)
Constructs a various tours of data space
randomPath(data, length = 10, reorder = TRUE, conditionvars = NULL, ...) seqPath(data, length = 10, reorder = FALSE, conditionvars = NULL, ...) alongPath(data, var, length = 10, current = NULL, ...)
randomPath(data, length = 10, reorder = TRUE, conditionvars = NULL, ...) seqPath(data, length = 10, reorder = FALSE, conditionvars = NULL, ...) alongPath(data, var, length = 10, current = NULL, ...)
data |
A dataframe |
length |
Path length, defaults to 10 |
reorder |
If TRUE (default) uses DendSer to reorder the path |
conditionvars |
A vector of variable names. The returned tour is for this subset of variables. |
... |
ignored |
var |
A variable name for alongPath |
current |
Default value of variables for alongPath |
A dataframe with the path
randomPath()
: Constructs a tour of data space following random observations
seqPath()
: Constructs a tour of data space following first length observations
alongPath()
: Constructs a tour of data space of length equi-spaced values in the range of var.
If var is a factor, its levels are used.
randomPath(mtcars,length=5) seqPath(mtcars,length=5) alongPath(mtcars,var="mpg", length=5, current=mtcars[1,])
randomPath(mtcars,length=5) seqPath(mtcars,length=5) alongPath(mtcars,var="mpg", length=5, current=mtcars[1,])
The colours whose weights are less than 1 are diluted. Colours whose weight is zero are returned as white,
other weights are grouped in nlevels
groups and colours diluted proportionally.
weightcolor(col, weights, nlevels = 5)
weightcolor(col, weights, nlevels = 5)
col |
A vector of colour |
weights |
A vector of weights, values between 0 and 1 |
nlevels |
The number of groups |
A vector of colours