Machine learning patterns for neuroimaging-genetic studies in the cloud.

Benoit Da Mota, Radu Tudoran, Alexandru Costan, Gaël Varoquaux, Goetz Brasche, Patricia Conrod, Herve Lemaitre, Tomas Paus, Marcella Rietschel, Vincent Frouin, Jean-Baptiste Poline, Gabriel Antoniu, Bertrand Thirion
Front. Neuroinform.. 2014-04-08; 8:
DOI: 10.3389/fninf.2014.00031

Read on PubMed

Brain imaging is a natural intermediate phenotype to understand the link between
genetic information and behavior or brain pathologies risk factors. Massive
efforts have been made in the last few years to acquire high-dimensional
neuroimaging and genetic data on large cohorts of subjects. The statistical
analysis of such data is carried out with increasingly sophisticated techniques
and represents a great computational challenge. Fortunately, increasing
computational power in distributed architectures can be harnessed, if new
neuroinformatics infrastructures are designed and training to use these new tools
is provided. Combining a MapReduce framework (TomusBLOB) with machine learning
algorithms (Scikit-learn library), we design a scalable analysis tool that can
deal with non-parametric statistics on high-dimensional data. End-users describe
the statistical procedure to perform and can then test the model on their own
computers before running the very same code in the cloud at a larger scale. We
illustrate the potential of our approach on real data with an experiment showing
how the functional signal in subcortical brain regions can be significantly fit
with genome-wide genotypes. This experiment demonstrates the scalability and the
reliability of our framework in the cloud with a 2 weeks deployment on hundreds
of virtual machines.


Know more about