Song Lab

Department of Biostatistics

Software

coxphGPLE: Global Partial Likelihood Estimation and Inference

Description This R package enables one to fit a Cox model with multiple functional covariate-environment interactions, where covariate effects are allowed to be modified nonlinearly by mixtures of exposed toxicants. The package provides global partial likelihood estimation and inference.
Download [R]

R Tool for Analysis of Accelerometer Data

Description R Code (including an R Shiny app) and an example data for the analysis of accelerometer data, used for sleep period detection of accelerometer tracking data.
Download [R Code and Example Data] [R Shiny App]

Tensor Decomposition

Description R Code for CP and non-negative tensor decompositions/factorizations. Also included is an app to study convergence and clustering properties of decomposition methods on a simulated tensor representing a dynamic paired kidney exchange program.
Download [R] [R Shiny App]

RCD: Scalable and Efficient Statistical Inference with Estimating Functions in the MapReduce Paradigm for Big Data

Description Python code in the form of Map-Reduce functions for conducting statistical inference based on estimating functions. The functions have been tested on the University of Michigan Flux Hadoop custer and support HDFS file format. Data examples are also provided.
Download [Python]

NGM: Bayesian Semi-parametric Stochastic Velocity Model with Ornstein-Uhlenbeck process prior (B-SSVM-OU)

Description Newton's growth Model (NGM) fits longitudinal (or time-series) data when a study examines 1) growth dynamics (trajectory, velocity, acceleration) of health outcomes (e.g., infant's body mass index) and 2) how growth acceleration (or velocity) is associated with observed exposure.
Download [R]

MODAC: Method of Divide-and-Combine in Regularized Generalized Linear Models for Big Data

Description Map-Reduce functions for fitting generalized linear models on Hadoop cluster. When a dataset is extrememly large (in terabytes), storing and fitting GLMs on local machine become impossible. The provided functions can fit and provide inference to GLMs for big data on a distributive file system using mapper and reducer functions in a parallel framework. The method is numerically robust.
Download [R & Python]
References
1. Tang, L., Zhou, L., and Song, P.X.K. (2016). Method of Divide-and-Combine in Regularised Generalised Linear Models for Big Data. arXiv preprint arXiv:1611.06208.

HDDesign: Determine the Sample Size for High Dimensional Classification Studies

Description Determine the sample size requirement to achieve the target probability of correct classification (PCC) for studies employing high-dimensional features. The package implements functions to 1) determine the asymptotic feasibility of the classification problem; 2) compute the upper bounds of the PCC for any linear classifier; 3) estimate the PCC of three design methods given design assumptions; 4) determine the sample size requirement to achieve the target PCC for three design methods.
Download [R]
References
1. Sanchez, B.N., Wu, M., Song, P.X.K., and Wang W. (2016). Study design in high-dimensional classification analysis. Biostatistics, doi: 10.1093/biostatistics/kxw018.

metaFuse: Fused Lasso Approach in Regression Coefficient Clustering

Description Used to detect parameter heterogeneity and cluster coefficients in data fusion when multiple similar data sets are combined. For each covariate, cluster its data-specific coefficients across different data sets. Supports Gaussian, logistic and Poisson regression models.
Download [R]
References
1. Tang, L., & Song, P.X.K. (2016). Fused Lasso Approach in Regression Coefficients Clustering -- Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1−23.

FLAPO: Fused Lasso with the Adaptation of Parameter Ordering in Combining Multiple Studies with Repeated Measurements

Description FLAPO is the R code that implemented fused lasso method to merge longitudinal data in the simulation study presented by the publication, Wang, Wang and Song (2016) in Biometrics.
Download [R]
References
1. Wang, F., Wang, L., & Song, P.X.K. (2016). Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI: 10.1111/biom.12496.

GeoCopula: GeoCopula Models for Spatial-Clustered Data

Description This software provides a unified modeling framework to analysis of spatial-clustered continuous and binary data.
Download [R]
References
1. Bai, Y., Kang, J., & Song, P.X.K. (2014). Efficient pairwise composite likelihood estimation for spatial‐clustered data. Biometrics, 70(3), 661-670.

GDEP: Gene Network Construction Based on Time Course Microarray Data

Description GDEP is a FORTRAN program that computes transition dependency as a gene-gene interaction measure for times series microarray data in the publication, Gao, Pu and Song (2008) in EURASIP Journal on Bioinformatics and System Biology.
Download [Fortran]
References
1. Gao, X., Pu, DQ., & Song, P.X.K. (2009). Transition dependency: a gene-gene interactionmeasure for times seriesmicroarray data. EURASIP Journal on Bioinformatics and Systems Biology, 2009, 2.

QIF: Quadratic Inference Function

Description R package QIF was developed to perform the estimation and inference for regression coefficient parameters in longitudinal marginal models using the method of quadratic inference functions.
Download [SAS] [R]
References
1. Qu, A., & Song, P.X.K. (2004). Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika, 91(2), 447-459.
2. Song, P.X.K., Jiang, Z., Park, E., & Qu, A. (2009). Quadratic inference functions in marginal models for longitudinal data. Statistics in medicine, 28(29), 3683-3696.
Software Disclaimer
THE SOFTWARE PACKAGES ARE PROVIDED “AS IS”, AND ONLY FOR NON-PROFIT USE. CURRENTLY THERE IS NO FORMAL SUPPORT ON IT. FURTHER ASSISTANCE BY THE AUTHORS REGARDING APPLICATION OF SOFTWARE WILL NOT BE PROVIDED, IN GENERAL. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA; BUSINESS INTERRUPTION) CAUSED AND ON ANY THEORY OF LIABILITY ARISING IN ANY WAY OUT OF THE USE OF THESE SOFTWARE PACKAGES.

This page was last modified on: 06/10/2018

Questions or comments with the site? Contact the maintainer (Mathieu Bray).