ARC’s statistical consulting team is comprised of some of the most talented biostatisticians and statisticians in the industry. We are often able to handle the most rigorous and theoretical approaches to Biostatistics research, with our assistance covering the entire data analysis phase of your research as well as any methodological and theoretical support that will be needed to justify your statistical models. Our statisticians believe that “running the models” is actually the easy part, but that the focus should be more on determining exactly which research design will be appropriate for your study and what value your research will offer to academia or your business organization.

We are here to provide full, customized Biostatistics research and analysis support to you and your organization. Our statistical consultants are are very familiar with just about every research design and cutting-edge analytical techniques aimed at mitigating any limitations of your statistical research, and furthermore, to produce viable findings that can potentially advance research in fields including Genomics,  Medicine, Nursing, and Public Health. Currently, one very popular area of research is with regards to how machine learning techniques can be incorporated to improve research in health-related data.

As an ARC client, you might be working on designing and implementing a randomized or non-randomized clinical experiment, or you might be needing support with finding the best variable selection methodology for building predictive models within the LASSO framework. Whatever you might need help with, we’re always here to assist! Please see below for some of our most popular research areas within Biostatistics:

Correlation does not prove causation. This is always true, especially in Theoretical Statistics, which utilizes a finite data set to draw statistical inferences that describe entire populations. The key motivation for Causal Inference research relies on “suggestions” as the bases for cause-and-effect relationships between variables. In theory, a causal relationship can be “proven” by controlling for all relevant covariates (hence reducing residual confounding in the model) and discovering the structural relationship that governs the predictor and outcome variables.

The most difficult aspect of establishing causality may be that research in this setting often relies on counterfactuals, whereby researchers are only able to observe the outcome under one of the many potential scenarios. In this scenario, our researchers often utilize the Rubin Causal Model (RCM) developed by Donald Rubin, which established the potential outcomes framework for comparing a set of potential outcome events in order to estimate the causal effect.

Clinical trials refer to a family of studies involving experiments, interventions, and observations aimed at generating clinical data. In its most basic form, a clinical trial study is a simple randomized statistical experiment testing the efficacy of a proposed drug treatment regimen. Typically, half of a study’s participants receive the treatment while the other half receive a placebo. This systematic method of research ensures that the participant groups are balanced with respect to their baseline characteristics. In addition, the effect of treatment being studied can be accurately estimated while also taking into account potential confounders including sociodemographic variables and comorbidities.

Clinical trials researchers are often seeking to catalyze improvements in health outcomes or safety and efficacy procedures through the analysis of organized data, which is gathered in both randomized and non-randomized experimental settings.  ARC’s PhD-level statistical consultants provide professional clinical trials assistance to a wide range of researchers, and have assisted with both government-funded and privately-funded projects. We work closely with academic researchers, medical institutions, and pharmaceutical companies, among other clinical researchers.

ARC’s research in Epidemiology covers a wide range of diseases and health conditions across the globe. Many of our clients in this field are interested in practice-focused problems including how to identify specific risk factors of a disease using statistical research techniques, and how to estimate distributions of a particular disease including the disease’s incidence, mortality, and prevalence. ARC’s Biostatistics expertise covers all areas of Epidemiology, with our support extending to clinical research as well as statistical data analysis.

ARC also specializes in statistical research relating to other health sciences including Environmental Health, Health Economics, Pharmacology, and Toxicology. We often collaborate with clinical researchers on their data analysis and publication projects, providing support in the form of clinical research assistance and statistical data analyses.

Also known as Big Data or Machine Learning, High-Dimensional Data and Statistical Learning takes an algorithmic approach to developing statistical methodologies that can be utilized to extract meaningful inferences. Data sets can range in size up to millions or tens of millions of rows and columns. With higher dimensional data, the traditional regression approaches fail because the data is overfitted (the data set contains too many data points). Penalized regression approaches such as the LASSO, ridge regression, and the elastic net approach (utilizing both L-1 and L-2 norms) are commonly used in these settings for the purpose of variable selection to build the best predictive models.

For Biostatistics research, higher-dimensional learning approaches are normally used to deal with Genomics data and other electronic health records databases. However, statistical learning approaches are applicable to all domains of statistical research, with pharmaceutical companies being some of our most active and engaged clients.

There is also a trend to employ statistical learning methodologies in other fields of research including Business, Finance, and other areas of Public Health. The wealth of data is just becoming much greater!

Although a central feature of most real-life statistical data, there are situations in statistical research where advanced statistical methodologies are needed to deal with missing data. In the traditional regression settings, observations with any missing variables are excluded from the analysis, resulting in loss of potentially valuable information. This might still allow you to make meaningful questions in data with bigger sample sizes, but in smaller sample size settings, the few missing datapoints might amount to a substantial portion of information available.

With advances in statistical theory, there are methodologies including multiple imputation allowing you to model the missing data points as a function of information that is available. Often times, the key assumption is that any missing data points are missing completely at random, although there are methodologies to mitigate bias and reasonably model the scientific problem even in non-random settings.

In many clinical studies and other statistical experiments, measurements are taken at multiple followup times throughout the course of the study. In this setting, the traditional regression approaches will not work because the measurements will be correlated across subjects and time. In order to take the correlation structure into account in our methodological approach, we will need to rely on longitudinal data analysis techniques including Generalized Estimating Equations (GEE) and Generalized Linear Mixed Models (GLMM). If we are able to correctly specify the correlation structure in our methodology, we’ll be able to obtain unbiased parameter estimates that are associated with lower standard errors.

It’s very important to capture the effect of clustering when performing statistical analyses, since not taking this into account will result in making incorrect assumptions with information being lost from the data.

In Statistical Genetics, we are focused on drawing inferences from genetic data representing biomarkers, genotype and phenotype, as well as molecular information representing a particular disease or a disorder. Due to the complex, latent correlation between observations in genetic data, it is often a challenge to determine an exact statistical analysis plan or the methodology needed to take into account the clustering of data points in addition to the sparsity present within millions of rows of data.

Perhaps the most popular Statistical Genetics projects in academia are Genome-wide association studies (GWAS), where scientists identify specific genes associated with various diseases. Other common forms of genetics research in Statistics include heritability studies and ancestry studies aimed at identifying biomarkers that may trace back at least hundreds or thousands of years.

In Survival Analysis, we are interested in modeling “time-to-event” data where our interest is in modeling survival times or time until a certain event occurs. One subtlety with survival data is the presence of a censoring mechanism, where subjects often drop out of a study, leaving no information about what happened to the subject after a certain timepoint. In this case, to completely throw out that observation would result in loss of information, and a better approach is to utilize survival analysis techniques to provide estimates taking into account the presence of censoring.

The implementation of a survival model is typically fairly simple, with the descriptive statistics consisting of Kaplan-Meier plots modeling the survival function and the Cox Proportional Hazards Regression model providing the hazard ratio comparing the levels of predictor variables. However, the difficulties can often be found in controlling for time-varying covariates and situations where the censoring distribution is not random.

Seeking expert support for your Biostatistics research? ARC is here to help.

Call us at (212) 609–1354 or email