San Francisco Bay Area Chapter of the American Statistical Association

Student Travel Award Seminar & SFASA Business Meeting

Thursday, June 14, 2012

4:30pm – 6:30pm

Evans Hall, Room 1011 (Jerzy Neyman room)

University of California, Berkeley

Candidates for Offices of SFASA

President Elect 2012-13

Clinton Brownley	Statement
Lu Tian	Statement

Treasurer	Doris (Yu) Shu
Vice President Biostatistics	Ruixiao Lu
Vice President General Applications	Megan Price
Secretary	Jacqueline Shaffer
Council of Chapters Representative	John Kornak

Award winners and speakers

Sharmodeep Bhattacharyya, University of California, Berkeley

Title: Bootstrap of count features in stochastic networks

Abstract:

Analysis of stochastic models of networks is quite important in light of the huge influx of network data in social, information and bio sciences. But a proper statistical analysis of features of different stochastic models of networks is still underway. Theoretically determining the expectations and variances of the count features, such as `moments' (Bickel, Chen & Levina, AoS, 2011), `motifs' (Kashtan et. al., Bioinformatics, 2004) and smooth functions of these can become highly difficult. We propose bootstrap methods for finding empirical distribution of such count features of the networks. The proposed resampling estimates depend on the size of the count features as well as the degree distribution of the network. Using these methods, we can not only estimate variance of count features but also get good estimates of such feature counts, which are usually expensive to compute numerically in large networks. In our paper, we prove theoretical properties of the bootstrap variance estimates of the count features as well as show their efficacy through simulation. We also use the method on publicly available Facebook network data for estimate of variance and expectation of some count features. We can generalize our work for bipartite graphs and multigraphs to relate to works of Owen (AoAS, 2007& 2011).

Tessa Childers-Day, University of California, Berkeley

Title: Applications of HMMs to Financial Data: A Data Motivated Approach

Abstract:

The notion that the behaviors of _Financial markets are affected by unobservable economic states or regimes is one which has both a theoretical and an intuitive basis. Recognizing the existence of these regimes and including them in analyses improves the understanding of the workings of markets. Performing such tasks as regime identification, classification, and forecasting can lead to an understanding of the statistical properties of financial markets, positively affecting the behavior of individuals, institutions, and governments. Hidden Markov Models (HMMs) are an intuitive, flexible technique that easily account for non-linearities and skewness often present in financial data. Unfortunately, inference is complex and many complicated techniques have not been adopted by the users of HMMs. Here, the application of HMMs to financial data is motivated and model selection is addressed (including identification of the number of regimes, specification testing, and examination of marginal distributions). Emphasizing parsimony, intuition, and computational simplicity, the literature is reviewed, and the process is illustrated using real world financial data.

Brianna Heggeseth, University of California, Berkeley

Title: The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference: an application to trajectory modeling

Abstract:

Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a trajectory vector of observations taken over time, there is often inherent dependence between measurements. However, one of the simplest and most used covariance assumption is conditional independence, which assumes that given the mixture component label, the outcomes for an observation unit are independent of each other. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little asymptotic bias when mixture components are well-separated even when outcomes are wrongly assumed to be conditionally independent. We also present a robust standard error estimator and show that it outperforms conventional estimators in simulations when the model is misspecified. Body mass index data from a national longitudinal study is used to demonstrate the effects of misspecification on potential inferences made in practice.

Christine Ho, University of California, Berkeley

Title: Biclustering of Linear Patterns in Gene Expression Data

Abstract:

Identifying a bicluster, or submatrix of a gene expression dataset wherein the genes express similar behavior over the columns, is useful for discovering novel functional gene interactions. In this article, we introduce a new algorithm for finding biClusters with Linear Patterns (CLiP). Instead of solely maximizing Pearson correlation, we introduce a fitness function that also considers the correlation of complementary genes and conditions. This eliminates the need for a priori determination of the bi cluster size. We employ both greedy search and the genetic algorithm in optimization, incorporating resampling for more robust discovery. When applied to both real and simulation datasets, our results show that CLiP is superior to existing methods. In analyzing RNA-seq fly and worm time-course data from modENCODE, we uncover a set of similarly expressed genes suggesting maternal dependence.

Direction and Map: www.berkeley.edu/map/

Return to Bay Area ASA Homepage