Special projects between Bavaria and Florida
Institute for Bioinformatics and Systems Biology
Department of Computational Sciences
Florida State University
Novel Data Mining Techniques Applied to the Analysis of Gene Expression and Polar Lipids Data of Glioblastoma Cell Line U87 MG
Glioblastoma are diffuse and highly invasive brain tumors. Conventional treatments for human gliomas have achieved extremely limited success. It is well-established that transfecting glioma cells with wild-type tumor protein p53 (wt p53) will trigger brisk apoptosis if the cell line harbors mutant p53, while the same transfection to cell lines which harbor the wild-type p53 such as U87 will result in a reduction or elimination of invasion and motility.
A recent study compared polar lipid changes in such treated cells and control cells and revealed modulated regulation and distinct structural differences in glycosphingolipids (GSLs), sulfatides and phospholipids. This resulting abundance of experimental data needs to be mathematically analyzed and interpreted.
The objective of our research is to apply novel multivariate analysis techniques for the interpretation, visualization and classification of experimental data reflecting polar lipid, proteomic and gene arrays changes in treated and control cells. Our results will demonstrate the utility of novel, specifically lipidomic data-adapted computational methods to interrogate and identify potential useful targets for glioblastoma cancer therapy by evaluating specific tumor responses to various therapies.
Despite recent progress in therapy and surgical intervention, glioblastoma multiforme, malignant
primary brain tumors, are nearly always fatal. As experimental model for Glioblastoma brain tumor, the U87 GM cell lines are used. Current research indicated that the combination of gene transfection prior chemotherapy treatment
can successfully be applied to cause cell death in (immortal) Glioblastoma cell lines. The U87 GM cell lines do carry wt p53 tumor suppressor gene, and not a mutant version. As the major characteristic the tumor cells are resistant to apoptosis. Cells transfected with tumor suppressor p53 prior treatment with a chemotherapy SN38 do undergo apoptosis - the chemotherapy allow does not trigger these desired phenotype. In U87 carrying wt p53, adenovirus transfection carrying p53 + SN38 treatment result in modest apoptosis and G2 arrest. The reverse order of SN38 treatment prior Ad-p53 transfection results in almost complete apoptosis and complete G2 arrest.
Cell lysates of all manipulated cell lines were analysed for variations in lipid levels. Mass spectrometry (MS) yielded lipid concentrations for wild type cells and various combination of treatments/transfections. The MS/MS technique, which is capable to separate the complex lipids of interest, is a Fourier-Transform Ion-Cyclotron-Resonance (FT-ICR) MS. The group measured the concentration in the cell lysates of 167 lipids twice for a total of 8 combination of treatments/tranfections. As a result, the matrix to be analysed in this study holds concentrations of lipids for each cell line w/ or w/o treatments/transfections.
Biologically relevant readout is the combined treatment of wt p53 with SN38, thus, emerging only of the comparison of (control) cells transfected with empty vector (DI312) prior SN38 treatment. % (Although reverse order of the relevant treatment results inthe ideal complete apoptosis, thus cell death, the analysis of self-lysed cells is not feasible)
To identify only lipids that respond to the combined treatment a couple of experiments of control treatments were conducted to answer the key question:
Which lipids, lipid classes, or lipid metabolic pathways are affected by the p53 treatment prior chemotherapy triggering apoptosis of tumor cells?
We aimed to identify partial correlations of lipid concentrations similarly affected by drug treatments while accounting for the biological interpretation of the treatments/transfections. Several approaches will be applied:
Via a correlation network or directly from the raw data, a Gaussian Graphical Model, which is based on partial correlations, will be generated.
Subgroups of the network will then be validated with a lipid metabolism pathway database, which was manually curated.
Traditionally correlation networks are generated to obtain information of co-regulation of any compound measured across any type of samples. In case of the present metabolite data, the correlation gives information of pairwise correlations for each measured lipid (the compound). The correlation thereby is determined based on the information of the measured samples -- the cell lines with various transfections/treatments. To determine the correlation the Pearson correlation is usually applied. However, the (Pearson) correlation networks have two major drawbacks.
First, the correlation can either be due to a direct (biological) co-regulation or also indirect associations. Second, the correlation network has to be arbitrarily thresholded to obtain any reasonable data analysis. To address these drawbacks, we chose to apply Gaussian graphical models (GGM) not only to extract only direct correlations from the measurements, but also to avoid arbitrary thresholding. A GGMs hold information on pair-wise partial correlation coefficients if a direct correlation exists, which is then validated by statistical testing to obtain only the significant partial correlations without arbitrary thresholding.
Partial correlation coefficients are calculated in a straight-forward manner if the number of samples n exceeds the number of variables p in the data set. In the case of the present lipidomics data, we have the opposite case of p>n where we have n=16 samples and p=157. The straight-forward calculcation can not be applied, since the calculation of the correlation/covariance matrix are not well-conditioned. To solve the GGM for p>n, Strimmer and colleagues introduced a all-in-one bootstrapping approach to simply use the source data matrix and find significant edges in a GGM. They use a bootstrapping aggregation technique together with a SVD of the correlation matrix P before they estimate with Bayes an optimal network.
The calculation of the GGM assumes that samples are independent, since in case of dependent samples the covariance estimates are no longer optimal: the standard deviation of the estimate monotonically increases with larger correlation coefficients of samples. We inspected the present lipidome dataset and observed a generally high correlation between samples. Although correlation between the measurement replicates were higher than to other transfection/treatment combinations the overall correlation of disease and control samples was very high (>.95). These results already indicate that the successful treatments of cells transfected with wt p53 prior SN38 chemotherapy treatment has only strong effects on only a couple of lipids and not the lipids in general.
To account for the high dependencies of all samples, we calculated the GGM mimicking that all samples are replicates of one another. Since seven of the eight samples are only measured as controls, this approach is applicable in our study.
To finally identify the partial correlations of lipids that only result from the biologically relevant treatment and not from any other side effect of single effects of the transfections/treatments, we implemented the following technique. For simplicity, we name the biologically relevant treatment ``disease'' with respect to the controls in the following, although this combination of transfection/treatment is the one hindering tumor cells for further growth.
Imagine a significant correlation of the entire dataset (seven control duplicates and one diseased duplicate). The occurring significant correlations on the entire dataset may then be a result of a perfect correlation of controls which is not majorly destroyed by the diseased samples, or a result where the primarily the disease samples induce a correlation. In other words: if a correlation only exists on the controls and has no specific relevance for the disease to be analysed, we would still detect a correlation in a GGM on solely the control samples. We obtained the disease relevant partial correlation by comparing the GGM of the entire dataset and the GGM of the control samples.
In addition to the described scenario, we have to account for the opposite case. If a correlation exists on the control samples which gets suppressed when adding the disease sample to the data, the disease samples are again uncorrelated with the control, thus, also disease relevant. We obtained these suppressed correlations by extracting the correlations that exist on control level but not on the entire dataset. As a result, we generated a GGM network of only lipid-lipid correlation (co-regulation) only resulting from the combination of wt p53 transfection prior SN38 chemotherapy treatment.
With the disease-relevant GGM at our hands, we were now able to relate this network to lipid pathway information. We combined pathways of lipid head group remodeling with pathways on lipid chain remodeling. Since the GGM captures direct interactions, dysregulated metabolic pathways will be detectable. We mined both GGM and the pathway dataset to identify those pathways, which are affected by the combination of wt p53 transfection and SN38 chemotherapy on Glioblastoma cell lines.