Quantitative analysis in systems biology often deals with noisy and complex high-dimensional problems. In genomics,
for instance, measurements of gene expression changes are usually obtained through various experimental conditions, and when these conditions correspond to time points, only a few of them are usually available. This is an unfortunate fact, as with small sample sizes it becomes hard to capture any form of dependence structure in
the data. Thus, key information about gene co-expression and co-regulation dynamics may be missed preventing
from a reliable reconstruction of the underlying gene-gene interaction network. It is often an advantage to be
able to exploit the sparsity and achieve the intrinsic dimensionality properties of biological systems under exam.
Such noisy high-dimensional systems depend on complex latent dynamics that may be viewed as mixtures of informative
sources with unknown statistical distribution and subject to unknown mixing mechanism. Blind source separation techniques, fuzzy rules, embedding principles and entropic measures represent useful methodological tools for disentanglement of the dynamics. We report results from data obtained by perturbation experiments
and gene network reconstruction and inference.
Genomics represents a challenging research field for many quantitative scientists, and recently a vast variety of statistical techniques and machine learning algorithms have been proposed and inspired by cross-disciplinary work with computational and systems biologists. In genomic applications, the researcher deals with noisy and complex high-dimensional feature spaces; a wealth of genes whose expression levels are experimentally measured, can often be observed for just a few time points, thus limiting the available samples. This unbalanced combination suggests that it might be hard for standard statistical inference techniques to come up with good general solutions, likewise for machine learning algorithms to avoid heavy computational work. Thus, one naturally turns to two major aspects of the problem: sparsity and intrinsic dimensionality. These two aspects are studied in this paper, where for both denoising and dimensionality reduction, a very efficient technique, i.e., Independent Component Analysis, is used. The numerical results are very promising, and lead to a very good quality of gene feature selection, due to the signal separation power enabled by the decomposition technique. We investigate how the use of replicates can improve these results, and deal with noise through a stabilization strategy which combines the estimated components and extracts the most informative biological information from them. Exploiting the inherent level of sparsity is a key issue in genetic regulatory networks, where the connectivity matrix needs to account for the real links among genes and discard many redundancies. Most experimental evidence suggests that real gene-gene connections represent indeed a subset of what is usually mapped onto either a huge gene vector or a typically dense and highly structured network. Inferring gene network connectivity from the expression levels represents a challenging inverse problem that is at present stimulating key research in biomedical engineering and system biology. Several attempts have been made to describe gene networks with only limited interactions, thus exploiting the inherent sparsity of these systems. This in turn suggests that a certain redundancy of links in gene networks, or equivalently the inherent sparsity structure of these systems, might let the essential connections be identified and the inverse problem be given both satisfactory definition and computationally efficient tractability.
Through the empirical analysis of financial return generating processes one may find features that are common to other research fields, such as internet data from network traffic, physiological studies about human heart beat, speech and sleep recorded time series, geophysics signals, just to mention well-known cases of study. In particular, long range dependence, intermittency, heteroscedasticity are clearly appearing, and consequently power laws and multi-scaling behavior result typical signatures of either the spectral or the time correlation diagnostics. We study these features and the dynamics underlying financial volatility, which can respectively be detected and inferred from high frequency realizations of stock index returns, and show that they vary according to the resolution levels used for both the analysis and the synthesis of the available information. Discovering whether the volatility dynamics are subject to changes in scaling regimes requires the consideration of a model embedding scale-dependent information packets, thus accounting for possible heterogeneous activity occurring in financial markets. Independent component analysis result to be an important tool for reducing the dimension of the problem and calibrating greedy approximation techniques aimed to learn the structure of the underlying volatility.