Background and industrial relevance
Most drugs are ineffective for most people, and the average cost for developing a new drug is about $3 billion. This is a result of an interplay of myriads of combinations of small-effect genetic and epigenetic factors risk that are associated with complex diseases. This stresses the need to use bioinformatics to analyze big data in medicine. Successful translational bioinformatics applications have often utilized gene networks and that functionally related genes tend to be highly interconnected and co-localize in networks, thereby forming modules.
Figure 1. (1) Molecular disease modules are identified as densely connected miss expressed groups of genes, by combining the protein interaction network with data of miss-expression in diseases. (2) We identify the most important upstream regulators of the disease modules using a gene regulatory network of T-cell differentiation. (3) Non-linear dynamical modelling of those upstream regulators and their genome-wide downstream effects are performed in T-cell differentiation.For example, I have developed integrative methods for (1) disease modules to generate the most relevant disease genes, which currently is a very active research topic (Fig. 1). Modules are effective for data mining but cannot be used for high-resolution modelling. In order to determine the causative relations, gene regulatory networks (GRNs) are ideal, because they describe the involvement of several ‘omics to regulate the gene expression dynamics. There are two main types of GRNs for working with dynamical data: (2) Large-scale statistical GRNs are frequently employed when analyzing ‘omics data, which use regression based methods of difference equations that approximates the true dynamics, e.g. LASSO. LASSO performs a sparse maximum likelihood estimation of the underlying GRN, but cannot directly be used to predict dynamical responses. (3) Dynamical mechanistic models, which are suitable for pharmacokinetics using ordinary differential equations (ODEs). However, these systems have limited coverage due to the computational complexity, and typically handle 5-10 genes.
Key technological challenges in translational bioinformatics lies in: how integrative modelling practically should incorporate the recent findings from network biology, and generate falsifiable hypotheses for personalized medicine? For example, modules and hubs in GRNs are important constraints in the parameter identification process, which could separately but not jointly be captured using my previous methods from statistics, and control theory areas. This project aims at identifying some success cases how data mining and modelling could be combined, and thereby a pipe-line to identify the best combination of drugs for different patients. The long-term objective is to establish a pipe-line to perform drug-induced non-linear modelling on genes of interest using modules and hubs for data-mining. The methods will be tested in collaboration with my pharmaceutical partner (Astra Zeneca, Mölndal) for the specific purpose of understanding systemic influence on some patented drugs for asthmatic and COPD patients.
Towards a coarse-graining network-based strategy for complex deseases
The project is centered on the recently developed concepts of (1) molecular disease modules, (2) hub regulators in gene regulatory networks and (3) core modelling of those regulators and their upstream regulators (Fig 1). Each of these concepts are carefully developed in their respective sub-projects, and are described below:
(1) Disease modules
We have developed two new disease module methods that define disease modules from cliques and disease data in graphs. In collaboration with Dr Daniel Muthas (Astra Zeneca) a hybrid method method of WCGNA and our previous clique based method was developed in a master student project, M Sc Mattias Köpsen. This method is currently being applied in several clinical studies (e.g. Hellberg et al 2016). Next, through a collaboration with a German group connected to the company Biocontrol we have developed Module Discoverer (Vlaic et al 2018), which is another clique-based method that scales better with network size and complexity than our previous methods. Another effort has been to assess and combine existing state-of-the-art disease module methods, as our experience is that they work for different purposes. For this purpose, we have created MODifieR, which as a publicly available R toolbox that runs ten different module methods and assess the performance by genomic concordance. Genomic concordance is an unbiased procedure suggested by us to justify a module, by performing enrichment of an independent big data-set. MODifieR is available at our gitlab webpage https://gitlab.com/Gustafsson-lab/MODifieR and currently we are writing a method manuscript describing the benchmark result on 24 different diseases (parts of the results was included in de Weerd 2017). During 2017-2018 Gustafsson and Muthas (Astra Zeneca) has supervised the M Sc student David Martinez to infer modules using ModifieR of individual asthma sub-types combining two ‘omics from 500 patients from U-BIOPRED consortium in a collaboration master thesis with Astra Zeneca (Martinez 2018). This work will continue at least ten more months.
- Martinez D, supervised by Muthas D (Astra Zeneca) & Gustafsson M. Identification of personalized multi-omic disease modules in asthma. Msc thesis at Högskolan i Skövde , 2018-6
- Vlaic S, Tokarski-Schnelle C, Gustafsson M, Dahmen U, Guthke R, & Schuster S. ModuleDiscoverer: Identification of regulatory modules in protein-protein interaction networks. Scientific Reports 8 No. 433, 2018, IF 4.
- MODIFIER an R-package for robust disease modules https://gitlab.com/Gustafsson-lab/MODifieR , released 2017
- de Weerd D, supervised by Gustafsson M. Disease modules: Comparing and integrating inference methods. Högskolan i Skövde, june 2017.
- Köpsen M, supervised by Muthas D (Astra Zeneca) & Gustafsson M. A network based approach for identification of robust disease modules in complex diseases, master thesis LITH-x-EX.12/3131—SE, 2016
- Hellberg S, Eklund D, Gawel D R, Köpsen M, Zhang H, Nestor C E, Kockum I, Olsson T, Skogh T, Kastbom A, Sjöwall C, Vrethem M, Håkansson I, Benson M, Jenmalm M C, Gustafsson M#*, and Ernerudh E#, #shared last and * lead corresponding author. Gene Expression Profiling of Resting and Activated CD4+ T Cells in Patients with Multiple Sclerosis, Cell Reports 16, 1-12, Sept 2016, IF 8.
(2) Upstream Regulators
This sub-project aims at extending the previous findings from Gustafsson (Science Transl Medicine 2015) as a general tool to identify upstream module regulators. For this purpose, we have initiated a collaboration with Professor R Lahesmaa (Finland) exploring a recent high-throughput technique called ATAC-seq that can be used clinically to indirectly measure transcription factor activity, and how it can be combined with transcriptomics. We here explore how the dynamics of upstream regulators in multiple sclerosis change over time. This work has been performed by the Post Doc Andreas Tjärnberg and PhD student Rasmus Magnusson and supervised by Gustafsson, for which several new methods are tested and a publication is planned during 2019/2020 as a general tool for working with this data (Åkesson 2018 presented one such method).
- Åkesson J, supervised by Magnusson A, & Tjärnberg A. Robust Community Predictions of Hubs in Gene Regulatory Networks. Msc thesis at Linköping University, Oct 2018
(3) Core modelling
During the first years of the CENIT project we published the high-performance modelling Python toolbox LASSIM, available at https://gitlab.com/Gustafsson-lab/lassim, which was used to infer a mathematical model of Th2 differentiation . R Magnusson and J Åkesson are under supervision of Gustafsson developing methods how a core model can be automatically be inferred using the ideas of (2).
- Magnusson R, Köpsén M, Lövfors W, Gawel D R, Nordling T, Schulze S, Nestor C, Cedersund G, Benson M, Tjärnberg A, and Gustafsson M. LASSIM –a network inference toolbox for genome-wide mechanistic modelling. a) Accepted as a poster at the ICSB conference in Barcelona in September 2016 and, b) PLoS Computational Biology, June 22, 2017, IF 5, c) https://gitlab.com/Gustafsson-lab/lassim.