Personal software


( GitHub site )

Bayesian PGMM

BayesianPGMM is an R package to infer a Bayesian piecewise growth mixture model with linear segments, for a given number of latent classes and a latent number of possible change points in each class, as described in this article (Psychometrika, PMID: 29150814, 2017). This is joint work with Nidhi Kohli and Maitreyee Bose. The package is available via GitHub at this link.



Multiway Regression

MultiwayRegression is an R package to predict one multi-way dataset (i.e., tensor) from another multi-way dataset, as described in this article (Journal of Compuational and Graphical Statistics, 2017). The package is available via GitHub at this link.

Multiway Classification

MultiwayClassification is an R package to perform linear classification for data with multi-way structure, as described in this article (Biostatistics, PMID:28115314, 2017). This is joint work with Tianmeng Lyu and Lynn Eberly. The package is available via GitHub at this link.



Bayesian Screening

BayesianScreening is an R package to compute posteriors from a gene-level hierarchical prior, as described in this article (Biometrics, PMID:28083869, 2017). Also includes functions to perform two-class testing with shared kernels for methylation array data (see this article in Biometrika). This is joint work with David Dunson. The package is available via GitHub at this link.


Bayesian Consensus Clustering

Bayesian consensus clustering is a tool to cluster a set of objects based on multiple sources of data. The model permits a separate clustering of the objects for each data source that adhere loosely to an overall clustering. The method is described in this Bioinformatics article, and this zip folder contains R code with instructions and examples. This is joint work with David Dunson. A user friendly R package to perform BCC, developed and maintained by Tim Triche, is available at this link.


Joint and Individual Variation Explained (JIVE)

JIVE is a flexible exploratory method for the integrated dimension reduction and visualization of multiple datatypes on the same set of samples. Matlab scripts and sample data are available here at the UNC Microarray Database, and a user-friendly R package with some enhancements is available here. The original manuscript is published in the Annals of Applied Statistics (link), and a the R package is described in Bioinformatics (link). This is joint work with Katherine Hoadley, Steve Marron, and Andrew Nobel; the R package was developed with Michael O'Connell.


Primer Efficiency Analysis

A collection of statistical methods to analyze the efficiency of a set of primer-pairs for quantitative real-time PCR. The R script PEA.r contains code to provide individual efficiency estimates with confidence, identify and remove unreliable primers, cluster amplification efficiencies, and adjust CT values. See the PEA user's guide for setup instructions, function descriptions and illustrative examples. The related manuscript is published in BMC Bioinformatics (link). This is joint work with Dirk Dittmer.

Binary Biclustering

Matlab code to bicluster a binary data matrix can be found here. This code is adapted from the LAS method, and uses a binomial score function. For more details, see this short report.

Classification Based Biclustering

Matlab code to search for biclusters that are distinguishing between two sample classes can be found here. For more details, see this short report. (This code gives a preliminary approach and has not been well tested - use at your own risk.) .

Software developed in collaboration with others


ToxPi GUI

ToxPi is a flexible tool to prioritize environmental chemicals based on diverse toxicity data. Developed in collaboration with David Reif, Myroslav Sypa, Ivan Rusyn, Fred Wright and others.

StatKey

StatKey is a collection of easy-to-use online applets to visualize bootstrapping and randomization tests. Also includes online applets for descriptive statistics and theoretical probability distributions. Developed in collaboration with Rich Sharp, Ed Harcourt, Kevin Angstadt, Patti Frazer Lock, Robin Lock, Kari Lock Morgan and Dennis Lock.

Software developed by others


Large Average Submatrices (LAS)

LAS (Shabalin et al. 2009) is a powerful biclustering algorithm for finding large average submatrices in high dimensional data. Executable software and matlab code are available here at the UNC Microarray Database.

Sparse Singular Value Decomposition (SSVD)

SSVD (Lee et al. 2010) enforces sparsity on the left and right singular vectors of a data matrix through an L1-penalty or hard thresholding. Software is available in both matlab and R: svds-code.rar .

GenABEL: an R library for genome-wide association analysis

GenAbel ( Aulchenko et al. 2007) is a package for performing genome-wide association analysis in R, with a lot of functionality. See this tutorial for a quick introduction.

LocusZoom

LocusZoom ( Pruim et al. 2010) is an easy to use online tool for producing pretty genome-wide association plots.