Biclustering

This is work that was a primary part of my dissertation work….

Capturing the dynamics of module expression using condition-specific clustering

Work in conjunction with Dr. Richard Bonneau’s lab at NYU

Abstract

In recent years, computational approaches have been developed to identify functional modules — groups of genes that cooperate to perform a common function — through the integration of different types of large-scale functional genomics data.  Many of these methods do not explicitly consider in vivo dynamics – the spatial and temporal coordination of gene expression.  In multicellular organisms, components of functional modules are reused throughout development in different cell types and are subject to context-dependent regulation.  We are developing an integrative approach to characterize the dynamic coordination of functional modules in order to address questions pertaining to development and cell-fate determination.

Our study represents the first application of cMonkey, an iterative annealing biclustering algorithm, to identify condition-specific clusters (CSCs) in a metazoan, C. elegans.  cMonkey identifies CSCs from (minimally) a compendium of expression data, but is able to incorporate additional data types, such as sequence motifs and network interaction data.  It then identifies and optimizes CSCs by sampling from the conditional probability distribution of each data type and calculating the likelihood that a gene and/or condition belong to a cluster.  An advantage of this method is that genes are not assigned exclusively to a single cluster but can assume membership in multiple clusters.  Our cMonkey analysis of C. elegans identifies approximately 1000 biclusters (covering ~17000 genes) from over 300 unique experimental conditions, which we are beginning to use to characterize module usage and pleiotropy during development.

By allowing CSCs to overlap, we can quantify the potential pleiotropic effects of genes participating in multiple clusters, ‘shared genes’.  We have found these genes to have an enrichment for higher numbers of GO Biological Process terms, higher numbers of genetic interactions, and even exhibit a bimodal distribution of predicted protein domain diversity that could all support the hypothesis that these genes play roles in multiple pathways or functions. Additionally, we have investigated how genes that are part of different biochemical pathways are distributed among CSCs that are enriched for specific conditions of interest, i.e. muscle tissues, neurons, larval stages, etc. and have found that different pathways exhibit tissue-specific expression patterns.

An example of a ‘shared gene’ is the serine/threonine polo-like kinase plk-1 regulates mitotic cell cycle progression and cytoplasmic polarity.  plk-1 is found in two different CSCs – one corresponds to embryonic development and mitosis, while the other is enriched for centrosome.  This protein was GFP localized in the early embryo and possesses a dynamic expression pattern supporting both functions: first seen to be enriched in the anterior of the embryo, and then beginning near the 1st cell division, it appears as punctate and on the centrosome.

These CSCs will be used to address questions of cell lineage and pleiotropy – specifically with regard to the organization and regulation of multiple, different pathways to result in a common phenotype. Furthermore, identifying CSCs has the additional benefit of addressing more fundamental issues, such as predicting functional annotations and associations between genes.

%d bloggers like this: