• Research

  • labex-calsimlab

  • Grand challenges

    State of the art

    • Computational chemistry is a natural outgrowth of theoretical chemistry and comes from the early attempts of theoretical physicists to solve the Schrödinger wave equation on computers.

      The bottleneck for ab-initio studies of molecules is related to the extremely small number of configurations that can be explored and thus many properties are not directly accessible using these approaches. Many problems in numerical analysis are at hand, including the generation and manipulation of large numbers of six-dimensional integrals, finding eigenvalues and eigenvectors of large matrices, or searching a complicated function for global and local minima and saddle points.

      The growing interest in theoretical-computational chemistry has led to the development of software applications that are widely available and account for much of the CPU usage at the supercomputing centres worldwide.

      One major and worthwhile improvement to reduce this computational cost could be to better exploit the symmetry of the molecules. However, it faces the challenge of determining the proper size of the space of configurations, as well as the optimal number of parameters required to yield reliable values of the properties.

    • Computational biology & bioinformatics involve the development and application of theoretical models and methods, data analysis and computer simulation techniques in the study of biological systems. In these disciplines, computational challenges are arising following the exponential growth of sequences and databases, and the discovery of complex biomolecular interactions.

      Today the limited computer power available together with the sequential nature of the algorithms is the main obstacles for advancing research in computational biology. To achieve more effectiveness in statistical genetics and genomics from new computer technologies it is of paramount importance to better parallelise the codes. This can only be achieved by bringing together computing experts and biostatisticians.

    Four grand challenges

    CalsimLab objectives are to build a coherent theoretical background, develop adapted numerical methods and implement efficient algorithms, for the following four Grand Challenges:

    1. linear scaling in computational chemistry

      covers the need to design and implement a reliable and efficient procedure taking into account electronic correlation that would scale almost linearly with the system size. This is currently a scientific computing hurdle that prevents the use of computational chemistry in many applications in which biomolecular interactions are at play.

    2. molecule energy approximation in computational chemistry

      is related to overcoming the simpler approximations of the Schrödinger equations that have been introduced by chemists to estimate the energy of molecules from ground states to excited states. This representation of energy has made the modelling of biological systems possible but the approach suffers from two drawbacks: i) only structural types previously encountered in smaller molecules can be parameterised for larger molecules and ii) the connection to the Schrödinger equation is unclear.

    3. sequential algorithms in computational biology

      many problems in computational biology rely on sequential algorithms that are significantly difficult for software engineers to parallelise and adapt. This is a major hurdle that prevents full benefit from the forthcoming HPC infrastructures. Some methods in use are known to be NP-hard (i.e., do not scale linearly). One potential roadmap to ensure that these analysis algorithms remain computationally feasible for large datasets is to take advantage of cutting-edge linear algebra libraries set for multi-core and GPU technologies.

    4. algorithms for genomics

      algorithms are used to explore data and discover laws of evolution that affect genomes and molecules and that requires validation against experimental data. Integrative genomics analysis, requires efficient tools to tackle the joint analysis of various types of genomics information. The challenge is that multivariate models have to be implemented in ultra-high dimensional data and coupled with inferential algorithms that search over vast model spaces. As the size of the genomics data sets and the number of samples are constantly increasing, the mathematical computations as well as the computational complexity become exponentially more expensive.