[an error occurred while processing this directive]

GOALIE: Supplementary Material for the Yeast Cell Cycle Benchmark Analysis

We used GOALIE to analyze the benchmark Yeast Cell Cycle micro-array data-set described in [P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyers, K. Anders, M. B. Eisen, P. O. Brown, D. Bolstein and B. Futcher, Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization, Molecular Biology of the Cell, 9:3273-3297, 1998].

The data-set was taken directly from the source and each of the experimental conditions clustered separately. The top level folder containing all the data is shown in the image below and its organization corresponds to the way GOALIE is actually dealing with any kind of data.

Yeast Cell Cycle data folder
organization.

You can find the data in your GOALIE download zip file . The alpha, cdc15 and elu sub-folders contain the clustered data corresponding to the three experimental conditions.

The GO-categories sub-folder contains a set of associations between the genes present in the data-sets and GO (process) terms. (Note: GOALIE can access gene/probe annotations in several ways: using simple text files is just the simplest one.)

The screenshots sub-folder contains several GOALIE screenshots and is irrelevant to this discussion.

The three *-wcd.lisp files contain what we call complete cluster script set files. I.e. they contain a description about where and how GOALIE can find the relevant files that make up all the clustered windows for a single experiment (experimental condition). Remember that GOALIE does not do clustering by itself, but it relies on external clustering programs, e.g., Genesis.

Loading each of the three files in GOALIE as described in the "Getting Started" pages and running the comparison tool between pairs of experiments will yield the results we describe in the following.

Each time-course experiment (alpha, cdc15 and elu) is broken down into five overlapping "windows". Each "window" is clustered into 15 clusters using a standard k-means algorithm (as noted, in our specific case, the k-means implementation provided in Genesis).

GOALIE Sample Analysis Results

In the following we describe a few results of our analysis of the Yeast Cell Cycle data following the schema presented in the main paper.

In order to understand some of the pictures that follow, we need to introduce the following naming scheme obtained by searching Entrez. The table follows closely the one presented in Spellman et al.'s paper, the main genes are grouped by category/processes. Each gene is associated with the library identifier used in the microarray and in the dataset accompanying Spellman et al.'s paper.

DNA
Repair PMS1 YNL082W
RAD27 YKL113C
RDH54 YBR073W
MSH6 YDR097C
OGG1 YML060W
HPR5 YJL092W
Synthesis CDC2 YDL102W
POL12 YBL035C
CDC9 YDL164C
POL30 YBR088C
POL32 YJR043C
PRI2 YKL045W
RFA1 YAR007C
POL1 YNL102W
RFA2 YNL312W
Replication initiation CDC45 YLR103C
CDC47 YBR202W
MCM2 YBL023C
Budding
Site selection/Morphogenesis GIN4 YDR507C
SRO4 YIL140W
CDC10 YCR002C
RSR1 YGR152C
GIC1 YHR061C
BUD3 YCL014W
BUD3 YCL012W
BUD4 YJR092W
Glycosylation OCH1 YGL038C
PMT1 YDL095W
PSA1 YDL055C
GDA1 YEL042W
PMI40 YER003C
ALG7 YBR243C
Secretion(exocytosis) EMP24 YGL200C
ERV25 YML012W
Cell Wall synthesis EXG1 YLR300W
CHS6 YJL099W
Cytokinesis CTS1 YLR286C
Mitosis
Chromatid Cohesion MCD1 YDL003W
SMC3 YJL074C
Miscellaneous
Cell cycle control PCL2 YDL127W
CLB6 YGR109C
HSL1 YKL101W
CLN1 YMR199W
SWE1 YJL187C
PCL9 YDL179W

For each of the main groups, GOALIE is capable of producing a summary Gantt-chart-like diagram showing the overall "up" and "down" regulation of a biological process (in this specific case a "biological process" as intended by the Gene Ontology). The green bars indicate a "down-regulated" process, while the red bars indicate an "up-regulated" process, and black bars indicate a steady process.

GOALIE also displays interactions among clusters grouped in different time windows. The summary Gantt charts are constructed from the enrichment of each cluster in each window. The graph of interactions among clusters can be traversed interactively. Selecting genes and GO categories will appropriatedly highlight the paths and nodes in the graph where certain genes and categories are present.

Click on the images to see a full blown image of the screenshot.

DNA: Repair, Synthesis and Replication

The following chart represents the concurrent behavior of the processes involving DNA repair, Synthesis and Replication.

DNA related
processes Gantt chart.

The following screeshots show the cluster graph display highlighting the processes involving DNA repair, Synthesis and Replication.

DNA replication initialion.

Graph view of DNA replication initiation category alongside the group of selected genes.

DNA replication initialion.

Graph view of DNA repair category with the group of selected genes.

Budding

The following chart represents the concurrent behavior of the processes involved in budding.

Budding
processes Gantt chart.

The following screeshots show the cluster graph display highlighting the processes involved in budding.

Bud site selection.

Graph view of bud site selection category along with the group of selected genes.

Cell Cycle Control

The following chart represents the concurrent behavior of the processes involved in cell cycle as well as other categories.

Cell
Cycle Control Gantt chart.

The following screeshots show the cluster graph display highlighting the processes involved in cell cycle control.

Cell Cycle Control graph display.

Graph view of regulation of cell cycle category alongside the group of selected genes.

Comparison Views

GOALIE allows for the side-by-side comparison of two different experiments. This comparison shows the Gantt-chart-like summary views one against the other, thus showing consistency and relevant differences between two experiments.

In Spellman et al.'s paper, two of the experiments are synchronized at different points in the cell cycle. The elutriation experiment synchronizes cells in G1, while the Cdc15 experiment synchronizes cells in late mitosis.

The following screenshots show GOALIE comparison view for the two experiments. The highlighted GO category S phase of mitotic cell cycle correctly shows the up-regulation of S-phase related genes (which support the time-course enrichment yielding such summary view) in the second window of the elutriation experiment, which corresponds to the correct S-phase of the cell cycle. In the Cdc15 view, the up-regulation of the S-phase related genes is correctly shifted to the third window.


Comparison view of
Elutriation vs. Cdc15 synchronization experiments.

Recap

The screenshots show how GOALIE can be used to explore the relationships among the behaviors resulting from the enrichment of the clustering analysis. They also show how some of the results from Spellman et al.'s paper are reproduced by GOALIE, in the sense that the information provided by GOALIE time-course enrichment of the set of cluster windows is consistent with the content of Spellman et al.'s paper.

Moreover, we can extract more information from the set of graph relationships computed by GOALIE. In particular, we can reconstruct temporal diagrams (akin to Kripke models) of the interactions among the categories resulting from the enrichment.


Reconstructed
Yeast Cell Cycle

This graph represents the reconstructed set of interactions among GO categories during the phases of the Yeast Cell Cycle.

Temporal Logic Re-descriptions

The time-course enrichment produced by GOALIE allows for the synthesis of Temporal Logic expressions (either Linear or Branching Temporal Logic - LTL or CTL) [E. Emerson and E. Clarke. Using branching time temporal logic to synthesize synchronization skeletons. In Science of Computer Programming, Vol 2, 241-266. 1982.] which can be used as a high level representation of system properties.

As a simple example (not yet integrated in the visualization tool), the system can find all the connections which exhibit a constant set of GO categories. These paths indicate that certain GO categories persist throughout the time course measurements.

Another example of the formulae generation capabilities of the system involves how we can build an until CTL formula by analyzing the connections between clusters. These formulae are of the form: some GO categories remain "active", until some other GO categories become active. Since we have been considering the biological processes hierarchy so far, we can rephrase the CTL until formulae as some process persists in the cell until some other process is activated.

Although we have not encountered this difficulty yet, larger data sets might cause GOALIE to generate many more formulae, which would necessitate heuristics to constrain the number of generated formulae. Criteria such as "novelty", as studied in the data mining community, can be used to filter formulae that may suggest new interpretations of the data and of the processes involved.

As an example of the current kind of formulae that can be found, based on the information that GOALIE keeps we show the following

Exists_path(`sister chromatid cohesion'
             Until (`G2 phase' And `G2 specific transcription'))

Eventually(Exists_path((`G2 phase' And `G2 specific transcription')
                         Until `G2/M specific transcription'))

GOALIE has all the pre-processed information available to automatically generate these two temporal logic formulae. The first one states that there exists a directed path connecting a sequence of clusters in successive time windows such that the GO category `sister chromatid cohesion' holds until the cell enters G2 phase. The second formula states, albeit obviously, the following: the cell, after dwelling in G2 phase, enters M phase. Although this is a well known feature of the cell cycle, it is interesting as it derives automatically from numerical expression matrices and a static ontological annotation.

Future Work

Development on GOALIE is ongoing. In the future we will concentrate on the following tasks.