A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms
- PMID: 22216090
- PMCID: PMC3245232
- DOI: 10.1371/journal.pone.0028072
A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms
Abstract
The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices D(i)∈R(m(i) × n), each with full column rank. Each matrix is exactly factored as D(i)=U(i)Σ(i)V(T), where V, identical in all factorizations, is obtained from the eigensystem SV=VΛ of the arithmetic mean S of all pairwise quotients A(i)A(j)(-1) of the matrices A(i)=D(i)(T)D(i), i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λ(k)≥1. Equality holds if and only if the corresponding eigenvector v(k) is a right basis vector of equal significance in all matrices D(i) and D(j), that is σ(i,k)/σ(j,k)=1 for all i and j, and the corresponding left basis vector u(i,k) is orthogonal to all other vectors in U(i) for all i. The eigenvalues λ(k)=1, therefore, define the "common HO GSVD subspace." We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.
© 2011 Ponnapalli et al.
Conflict of interest statement
Figures
17-arrays matrices
,
and
. The underlying assumption is that there exists a one-to-one mapping among the 17 columns of the three matrices but not necessarily among their rows. These matrices are transformed to the reduced diagonalized matrices
,
and
, each of 17-“arraylets,” i.e., left basis vectors
17-“genelets,” i.e., right basis vectors, by using the organism-specific genes
17-arraylets transformation matrices
,
and
and the shared 17-genelets
17-arrays transformation matrix
. We prove that with our particular
of Equations (2)–(4), this decomposition extends to higher orders all of the mathematical properties of the GSVD except for complete column-wise orthogonality of the arraylets, i.e., left basis vectors that form the matrices
,
and
. We therefore mathematically define, in analogy with the GSVD, the “common HO GSVD subspace” of the
matrices to be the subspace spanned by the genelets, i.e., right basis vectors
that correspond to higher-order generalized singular values that are equal,
, where, as we prove, the corresponding arraylets, i.e., the left basis vectors
,
and
, are orthonormal to all other arraylets in
,
and
. We show that like the GSVD for two organisms , the HO GSVD provides a sequence-independent comparative mathematical framework for datasets from more than two organisms, where the mathematical variables and operations represent biological reality: Genelets of common significance in the multiple datasets, and the corresponding arraylets, represent cell-cycle checkpoints or transitions from one phase to the next, common to S. pombe, S. cerevisiae and human. Simultaneous reconstruction and classification of the three datasets in the common subspace that these patterns span outline the biological similarity in the regulation of their cell-cycle programs. Notably, genes of significantly different cell-cycle peak times but highly conserved sequences , are correctly classified.
, showing that the 13th through the 17th genelets correspond to
. (c) Line-joined graphs of the 13th (red), 14th (blue) and 15th (green) genelets in the two-dimensional subspace that approximates the five-dimensional HO GSVD subspace (Figure S4 and Section 2.4), normalized to zero average and unit variance. (d) Line-joined graphs of the projected 16th (orange) and 17th (violet) genelets in the two-dimensional subspace. The five genelets describe expression oscillations of two periods in the three time courses.
arraylets of each dataset. The dashed unit and half-unit circles outline 100% and 50% of added-up (rather than canceled-out) contributions of these five arraylets to the overall projected expression. (d–f) Expression of 380, 641 and 787 cell cycle-regulated genes of S. pombe, S. cerevisiae and human, respectively, color-coded according to previous classifications. (g–i) The HO GSVD pictures of the S. pombe, S. cerevisiae and human cell-cycle programs. The arrows describe the projections of the
shared genelets and organism-specific arraylets that span the common HO GSVD subspace and represent cell-cycle checkpoints or transitions from one phase to the next.
References
-
- Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P. Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature. 2006;443:594–597. - PubMed
-
- Golub GH, Van Loan CF. Matrix Computations. Baltimore: Johns Hopkins University Press, third edition; 1996. 694
-
- Van Loan CF. Generalizing the singular value decomposition. SIAM J Numer Anal. 1976;13:76–83.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
