Research Description
Residual Networks [4] (ResNets) enable successful optimization of very deep neural network architectures with hundreds of layers. Its representational power has led to improvements in a range of high performance computer vision applications, such as classification, object detection, segmentation, etc. In the seminal paper [5], it was observed that the structure of a residual network is similar to the Euler discretization of an ODE. By parameterizing the derivative of the hidden state of a neural network, NODEs allow for defining residual networks with continuous depth, where during inference precision can be traded off for speed. It has further been demonstrated how many of the networks commonly used in deep learning can be interpreted as different discretization schemes of differential equations, but in this case each layer is parameterized independently. NODEs can also be further stabilized by injecting noise during training to increase robustness to noise perturbations and adversarial examples.In our work [1,2,3], it is demonstrated how Standalone NODEs can be used in isolation from conventional network layers, so that the end-to-end network is formulated as a NODE. The benefit of this formulation is that the mathematical properties of the model holds true, from input data points to predictions. This makes it possible to, e.g., analyze the behavior of predictions under different perturbations of data points or weights, and enables general sensitivity analysis of the mapping. However, as the NODE is described only by fully-connected layers, there are limitations in terms of applicability.The group is working on several natural continuations of the initiated research, where we list two of them:
Science-Activated Standalone NODEs for Inverse and Ill-posed ODEs, PDEs and Variational Problems
We develop a framework of effective inclusion of scientific-based knowledge into the training of Standalone NODEs proposed in [1,2] to solve well-posed problems for ODEs and PDEs. Unlike PINNs [6], we incorporate the “physics” not in the loss function but in the core of the Standalone NODEs. This is due to the unique design of Standalone NODEs since the activation function is general and covers a broad class of functions. Moreover, our approach will allow us to use (a) a more general loss functional which not necessarily corresponds to the PDE or the ODE we want to solve and (b) the sensitivity problem to make a robustness analysis.
This setup will encapsulates a wide range of problems in mathematical physics and mathematical biology including Navier-Stokes equations, conservation laws, diffusion processes, advection-diffusion-reaction systems, and kinetic equations.
Within this project, the following research challenges are addressed:
- Incorporate the “Science” in the core of the Standalone NODEs to solve
(i) Well-posed problems (Direct problems) for nonlinear PDEs and ODEs.
(ii) Ill-posed and inverse problems for nonlinear PDEs and ODEs (such as Parameter Identification problems) as well as for Variational Problems. - Studying convergence, stability and accuracy of “Science-Activated” Standalone NODEs-solutions.
- Implement ‘Science-Activated” Standalone NODEs both in isolation and in combination with traditional deep learning architectures.
Convolutional Neural ODEs (CNODEs)
Convolutional neural networks (CNN) is a cornerstone in deep learning applied to image analysis. For data with spatial or temporal structure (images, sound, etc.), convolutional neural networks (CNNs) [7] is the de facto standard. Convolutional layers learn and convolution kernels act as a general feature extractor which makes CNNs both more efficient and easier to optimize, while using a lower number of trainable weights. In combination with max pooling, CNNs extract compact features that represent the information needed to solve the task. These can subsequently be utilized by fully-connected layers to produce the desired output of a network. Incorporating convolutional and pooling operations in the NODE framework is a considerable research challenge, but also represent an important step towards widely applicable NODEs. This also opens up for interesting research directions, such as convolutional layers that are continuous in both depth and spatial extent. The goals of this project are:
- Derive and analyze convolutional NODEs (CNODEs) continuous in the time and spatial domains.
- Generalize the Nonlinear Conjugate Gradient method in [1,2] as an optimizer subject to CNODE-constraints.
- Implement CNODEs both in isolation and in combination with traditional deep learning architectures.
- Derive and analyze the sensitivity problem for CNODEs to render qualitative statements about the behavior of error propagation in the learning process under the influence of noise.
PhD students
Rym Jaroudi is the first PhD student to defend her dissertation thesis [3] within the group of MML. The thesis has the title:
Inverse Problems for Tumour Growth Models and Neural ODEs.
Shortly from the abstract:
This thesis concerns the application of methods and techniques from the theory of inverse problems and differential equations to study models arising in the areas of mathematical oncology and deep learning.
References
- George Baravdish, Gabriel Eilertsen, Rym Jaroudi, B Tomas Johansson, Lukás Malý, and Jonas Unger. Learning via nonlinear conjugate gradients and depth- varying neural odes. arXiv preprint arXiv:2202.05766, 2022.
- Rym Jaroudi, Lukás Malý, Gabriel Eilertsen, Tomas B Johansson, Jonas Unger, and George Baravdish. Standalone neural odes with sensitivity analysis. arXiv preprint arXiv:2205.13933, 2022.
- Rym Jaroudi. Inverse Problems for Tumour Growth Models and Neural ODEs, Dissertation thesis, Linköping University 2023.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770?778, 2016.
- Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, and N. Cesa-Bianchi, editors, Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 6572?6583. Curran Associates Inc, Red Hook, NY, USA, 2018.
- Raissi M, Perdikaris P and Karniadakis G 2019 Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations J. Comput. Phys. 378 686?707
- Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.