Each Lecturer will hold up to four lectures on one or more research topics.
TopicsMachine Learning, Artificial Intelligence
Matej Balog is a Senior Research Scientist at Google DeepMind, London. He’s working in the Science team on applications of AI to Mathematics and Computer Science. His most recent work has been on algorithm discovery for matrix multiplication, published last year in Nature. Prior to joining DeepMind he worked on program synthesis and understanding (with Microsoft Research and Google Brain). He received his PhD from the University of Cambridge (a joint programme with the Max-Planck-Institute in Tübingen) and his Masters from the University of Oxford.
TopicsFoundation Models, Transformers, Representation Learning, Reinforcement Learning,
TopicsFoundation Models, Large Language Models, PaLM
TopicsGraph Neural Networks, Machine Learning, Deep Learning
Abstract. Graph Neural Networks (GNNs) are an essential model class in the modern deep learning toolbox. They excel not only in classical machine learning tasks on graphs such as node classification, graph classification, and link prediction, but are becoming increasingly important for algorithmic reasoning tasks and for modeling various complex, interacting and dynamical systems – from predicting dynamics in social networks to learning accurate physical simulators.
This lecture will introduce GNNs from a message passing perspective, discuss the main representative GNN variants in use today, and give an overview of how GNNs are used in various graph representation learning tasks.
Abstract. The second part of the lecture will focus on how GNNs can be used for modeling complex, dynamic interacting systems. We will cover how to learn to simulate the dynamics of complex interacting systems with GNNs and how to use GNNs to discover relations or interactions.
Abstract. The world around us is highly structured: our everyday environments contain a myriad of objects and other components that can be independently interacted with or reasoned about. A core challenge in perception is to learn to infer such a structured understanding of everyday scenes. This is reflected in computer vision tasks such as object detection, instance segmentation, or pose estimation.
This lecture will introduce methods for structured and object-centric scene understanding: we will discuss how object representations can be integrated into end-to-end deep learning architectures, giving rise to object-centric architectures such as the Detection Transformer (DETR). We will further cover how object-centric models such as Slot Attention can be trained without supervised object labels to discover objects in raw image data.
Abstract. The second part of this lecture will cover extensions of object-centric models to learn about 3D scenes, enabling use cases such as scene editing and novel view synthesis. Finally, we will discuss how this class of models can be used to learn about dynamics in scenes, to consistently track objects in scenes and to learn to simulate their dynamics forward in time.
Topicsmachine learning, learning theory, reinforcement learning
TopicsComputer Vision, Compressed Sensing, Machine Learning, Signal Processing, Robotics
Yi Ma received his B.S. degree in Automation and Applied Mathematics from Tsinghua University, China in 1995, an M.S. degree in EECS in 1997, an M.A. degree in Mathematics in 2000, and a Ph.D. in EECS in 2000 all from UC Berkeley. He was on the faculty of ECE Department of the University of Illinois at Urbana-Champaign from 2000 to 2011. He was the manager of the Visual Computing Group and a principal researcher of Microsoft Research in Asia from 2009 to 2013. He was then a founding professor and the executive dean of the School of Information Science and Technology of ShanghaiTech University from 2014 to 2017. He joins the faculty of EECS of UC Berkeley in 2018. You may find a more detailed biography from the website at:
TopicsFoundation Models, Large Language Models
Dr. Gerhard Paaß founded the text mining group at the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS. He worked in the context of many research stays at universities abroad (China, USA, Australia, Japan). He is the author of numerous publications and has received several best paper awards in the field of AI. In addition, he has been active as a lecturer for many years and, within the framework of the Fraunhofer Big Data and Artificial Intelligence Alliance, has played a very significant role in defining the new job description of the Data Scientist and successfully establishing it in Germany as well. He recently wrote a book on “Foundation Models for Natural Language Processing – Pre-trained Language Models Integrating Media” which will be published by Springer Nature. As Lead Scientist at Fraunhofer IAIS, Dr. Paaß is part of the team developing the OpenGPT-X model and actively involved in establishing a comprehensive computing infrastructure for Foundation Models in the LEAM project.
Foundation Models for Natural Language Processing – Pre-trained Language Models Integrating Media, Gerhard Paaß, Sven Giesselbach, Springer, May, 2023
Abstract. The training data of language models may contain misogynistic, racist, or anti-religious texts, which are then reproduced by the model. Especially for dialog applications the output should be meaningful, specific and interesting, avoiding harmful suggestions and unfair bias, as well as false claims. The first step is a targeted preprocessing of the training data including deduplication and filtering of harmful content, which requires a lot of effort. After pre-training, the model has to be fine-tuned to controlled dialog data, possibly taking into account the documents retrieved by parallel retrieval operations. Explicit filters can be used in postprocessing to avoid unwanted contents. In addition, the history of a dialog has to be saved and retrieved later to be taken into account during answer generation. Reinforcement learning with human feedback is used to generate text that is targeted to users’ prompts and produces the desired content. To prevent text-to-image models from delivering sexist or offensive depictions, the approaches must be extended to multimedia and multilingual domains. A final aspect is the explainability of the generated content, which increases the acceptance of the returned information. We discuss the level of trustworthiness achieved by the current approaches including our own OpenGPT-X model, and compare this with the proposed EU AI act and other planned regulations.
Abstract. Language models such as GPT-4 have the ability to capture a lot of information and world knowledge contained in their training data. However, if the model’s prompt concerns very recent or very specific topics, there is often no information in the training data. To avoid costly retraining of the model with actual data, you can provide external information that the model should cover in the generated text. The retriever-reader scheme follows this path. The retriever employs dense retrieval to find texts matching the query. The reader is a pre-trained language model which is fine-tuned to combine the internal knowledge of the model with the retrieved texts and to generate a suitable answer. It has been shown that this approach improves the fraction of correct answers. In addition, retrieved documents can be added to the text as references. Similarly, other types of information, such as the contents of tables or databases, can be incorporated into a language model. We discuss the current accuracy improvements achieved by these models and new approaches to for enhancement.
Abstract. Traditional search engines rely on the matching of terms between the query and the documents. However, term-based retrieval systems have several limitations such as lack of robustness with respect to polysemy, synonymy, and paraphrasing between the query and the documents. Recently, Foundation Model techniques have been used to improve the representation of textual data and to enhance the ability of information retrieval systems to understand natural language queries. One approach is dense retrieval, where query and documents are expressed as embeddings and matched by nearest neighbor search. In addition, attention mechanisms have been employed to improve the ability of search engines to attend to important parts of the query and documents for matching. In the talk, we also discuss how to incorporate external knowledge during retrieval, such as knowledge graphs and information from different media like images. As the results for many benchmarks show, dense retrieval has significantly improved the performance of search engines. However, even with approximate nearest neighbor search, the cost of dense retrieval is higher than term-based retrieval and is an obstacle to widespread use. Nevertheless, all major commercial search engines claim to use language technology today.
Abstract. Starting with the Transformer, the concept of self-attention was invented, which represents the meaning of tokens in a text by context-sensitive embedding vectors. Based on the correlation of embeddings of input tokens, each layer of the network generates more expressive embeddings, taking into account the relation to all tokens of the input text. These models are pre-trained on large collections of text documents with the task of predicting omitted tokens or the next token in the sentence. The models achieve an unprecedented accuracy for generating new text and can be adapted to new tasks by fine-tuning. If the models have a sufficient number of parameters, they can simply be prompted to perform a task without any fine-tuning. It turned out, that the models can also be applied to other media like images, sound, video, etc. by partitioning these media into tokens and applying self-attention to capture their contents. They are called Foundation models because they can be used as a basic architecture for a wide range of AI tasks, superseding prior models such as RNN and CNN. In this lecture we describe the basic architecture of BERT, GPT, and the Transformer and discuss the concept of transfer learning. We then explain token representations of various media and models simultaneously processing tokens from different media. Finally, we summarize the properties and potential impact of foundation models.
TopicsData Science, Global Optimization, Mathematical Modeling, Financial Applications
Panos Pardalos was born in Drosato (Mezilo) Argitheas in 1954 and graduated from Athens University (Department of Mathematics). He received his PhD (Computer and Information Sciences) from the University of Minnesota. He is a Distinguished Emeritus Professor in the Department of Industrial and Systems Engineering at the University of Florida, and an affiliated faculty of Biomedical Engineering and Computer Science & Information & Engineering departments.
Panos Pardalos is a world-renowned leader in Global Optimization, Mathematical Modeling, Energy Systems, Financial applications, and Data Sciences. He is a Fellow of AAAS, AAIA, AIMBE, EUROPT, and INFORMS and was awarded the 2013 Constantin Caratheodory Prize of the International Society of Global Optimization. In addition, Panos Pardalos has been awarded the 2013 EURO Gold Medal prize bestowed by the Association for European Operational Research Societies. This medal is the preeminent European award given to Operations Research (OR) professionals for “scientific contributions that stand the test of time.”
Panos Pardalos has been awarded a prestigious Humboldt Research Award (2018-2019). The Humboldt Research Award is granted in recognition of a researcher’s entire achievements to date – fundamental discoveries, new theories, insights that have had significant impact on their discipline.
Panos Pardalos is also a Member of several Academies of Sciences, and he holds several honorary PhD degrees and affiliations. He is the Founding Editor of Optimization Letters, Energy Systems, and Co-Founder of the International Journal of Global Optimization, Computational Management Science, and Springer Nature Operations Research Forum. He has published over 600 journal papers, and edited/authored over 200 books. He is one of the most cited authors and has graduated 71 PhD students so far. Details can be found in www.ise.ufl.edu/pardalos
Panos Pardalos has lectured and given invited keynote addresses worldwide in countries including Austria, Australia, Azerbaijan, Belgium, Brazil, Canada, Chile, China, Czech Republic, Denmark, Egypt, England, France, Finland, Germany, Greece, Holland, Hong Kong, Hungary, Iceland, Ireland, Italy, Japan, Lithuania, Mexico, Mongolia, Montenegro, New Zealand, Norway, Peru, Portugal, Russia, South Korea, Singapore, Serbia, South Africa, Spain, Sweden, Switzerland, Taiwan, Turkey, Ukraine, United Arab Emirates, and the USA.
TopicsMachine Learning, High Dimensional Data Analysis, Deep Learning
Qing Qu is an assistant professor in EECS department at the University of Michigan. Prior to that, he was a Moore-Sloan data science fellow at Center for Data Science, New York University, from 2018 to 2020. He received his Ph.D from Columbia University in Electrical Engineering in Oct. 2018. He received his B.Eng. from Tsinghua University in Jul. 2011, and a M.Sc.from the Johns Hopkins University in Dec. 2012, both in Electrical and Computer Engineering. He interned at U.S. Army Research Laboratory in 2012 and Microsoft Research in 2016, respectively. His research interest lies at the intersection of foundation of data science, machine learning, numerical optimization, and signal/image processing, with focus on developing efficient nonconvex methods and global optimality guarantees for solving representation learning and nonlinear inverse problems in engineering and imaging sciences. He is the recipient of Best Student Paper Award at SPARS’15 (with Ju Sun, John Wright), and the recipient of Microsoft PhD Fellowship in machine learning. He is the recipient of the NSF Career Award in 2022, and Amazon Research Award (AWS AI) in 2023.
Abstract: Machine learning is transforming every field of science and engineering. However, as data is increasing in volume and dimension, the performance of modern machine learning methods is critically dependent on the choice of data representation. In the past decade, although we witnessed the revolutionary empirical success of many representation learning methods, from (convolutional) dictionary learning to deep learning . the underlying principles behind their success still largely remain a mystery, which hinders their further development and adoption to broader applications. One of the major challenges originates from the nonlinearity of the data representation models, so that it often results in complicated, highly nonconvex optimization problems — in the worst-case, solving nonconvex problems could be NP-hard. Nonetheless, various empirical evidence suggests that the symmetric properties of the problem and intrinsic low-dimensional structures of the data often alleviate the hardness of these problems, that simple heuristic nonconvex methods often work surprisingly well for learning succinct representations.
Abstract: This lecture focuses on the study of the low-dimensional structures appearing in the last-layer of deep networks. Recently, an intriguing phenomenon has been discovered in the final stages of network training for many classification problems. This phenomenon, known as Neural Collapse, has generated significant interest. It involves the collapse of the last-layer features and classifiers into elegant and simple mathematical structures, where all training inputs are mapped to class-specific points in feature space, and the last-layer classifier converges to the dual of the features’ class means while achieving the maximum possible margin. This phenomenon persists across various network architectures, datasets, and even data domains. The lecture explores the symmetry and geometry of Neural Collapse and develops a rigorous mathematical theory that explains when and why this low-dimensional structure of the last-layer representation occurs under the unconstrained feature model, and justifies its ubiquity across different network architectures, training losses, and problem formulations.
Abstract: In the second lecture, we delve deeper into the low-dimensional structures of representation in intermediate layers, building on the concepts covered in the previous lecture. Our findings indicate that as we move from shallow to deep layers of a learned deep network, there is a gradual collapse in feature variability often with a linear decay ratio. We established a theoretical explanation for this phenomenon using a multi-layer deep linear network. Our analysis shows that if a deep linear network is trained via gradient descent using small and orthogonal weights, the within-class variability measure undergoes linear decay as we go from shallow to deep layers. Moreover, we demonstrate that the rate of linear decay is determined by the weight initialization scale. Finally, we demonstrate how our study can be leveraged to provide guidelines for improving the generalizability and transferability of deep representations, leading to more efficient fine-tuning strategies for classification problems in vision.
Abstract: In recent years, over-parameterized models with a higher number of parameters than the amount of available data have become dominant in the field of machine learning, leading to improved performances. However, when the training data is corrupted, over-parameterized models tend to overfit and fail to generalize. The third part of the lecture aims to tackle this issue through low-dimensional modeling. The approach involves leveraging the implicit regularization of gradient descent on overparameterized models and exploiting the incoherence between sparse corruption and low-rank structures to prevent overfitting during training. This is achieved by accurately separating noise from data using a method called Double Over-Parameterization (DOP). Contrary to classical wisdom, which suggests that more parameters exacerbate overfitting, DOP uses a specific choice of learning rates on different sets of model parameters to prevent overfitting. Empirical results show that DOP outperforms traditional methods when applied to tasks such as image recovery from corrupted measurements and image classification under label noise.
TopicsKernel Methods, Statistical Machine Learning, Information Theory
Zoltan Szabo is a Professor of Data Science at the Department of Statistics, LSE. Zoltan’s research interest is statistical machine learning with focus on kernel methods, information theory (ITE), scalable computation, and their applications. These applications include safety-critical learning, style transfer, shape-constrained prediction, hypothesis testing, distribution regression, dictionary learning, structured sparsity, independent subspace analysis and its extensions, Bayesian inference, finance, economics, analysis of climate data, criminal data analysis, collaborative filtering, emotion recognition, face tracking, remote sensing, natural language processing, and gene analysis. Zoltan enjoys helping and interacting with the machine learning (ML) and statistics community in various forms. He serves/served as (i) an Area Chair of the most prestigious ML conferences including ICML, NeurIPS, COLT, AISTATS, UAI, IJCAI, ICLR, (ii) the moderator of statistical machine learning (stat.ML) on arXiv, (iii) a DSI Management Committee Member, (iv) the Programme Director of MSc Data Science, (v) the Program Chair of the Data Science Summer School, (vi) an editorial board member of JMLR and associate editor of the journal Mathematical Foundations of Computing, (vii) a reviewer of various journals (such as Annals of Statistics, Journal of the American Statistical Association, Journal of Multivariate Analysis, Statistics and Computing, Electronic Journal of Statistics, Annals of Applied Probability, IEEE Transactions on Information Theory, Information and Inference: A Journal of the IMA, Foundations of Data Science, Foundations of Computational Mathematics, or Machine Learning), (viii) a reviewer of European (ERC), Israeli (ISF) and Swiss (SNSF) grant applications, (ix) a mentor of newcomers (NeurIPS, ICML). For further details, please see Zoltan’s website.
TopicsFoundation Models, fine-tuning Large Language Models, Reinforcement Learning with Human Feedback, Deep Reinforcement Learning
Each Tutorial Speaker will hold more than four lessons on one or more research topics.
TopicsTheory of Machine Learning, Theory of Deep Neural Networks
Bruno Loureiro is currently a research scientist at the Centre for Data Science at the École Normale Supérieure in Paris working on the crossroads between machine learning and statistical mechanics. Before moving to ENS, he was a researcher at EPFL, a postdoctoral researcher at the Institut de Physique Théorique (IPhT) in Paris, and received his PhD from the University of Cambridge. He is interested in Bayesian inference, theoretical machine learning and high-dimensional statistics more broadly. His research aims at understanding how data structure, optimisation algorithms and architecture design come together in successful learning.
Wonders of high-dimensions: the maths and physics of Machine Learning
Wonders of high-dimensions: the maths and physics of Machine Learning
Wonders of high-dimensions: the maths and physics of Machine Learning
TopicsDeep Learning, Data Science
Dr Varun Ojha is Associate Professor (Senior Lecturer) in Computing Sciences at the School of Computing, Newcastle University, UK. Previously Dr Ojha was Assistant Professor (Lecturer) at the University of Reading. He was Postdoctoral Fellow at ETH Zurich, Switzerland. Before this, Dr Ojha was a Marie-Curie Fellow at the Technical University of Ostrava, Czech Republic. Dr Ojha received a PhD in Computer Science from the Technical University of Ostrava, the Czech Republic. Earlier, Dr Ojha received a research fellowship position funded by the Govt of India’s Dept of Science and Technology at Visvabharati University, India. Dr Ojha has 60+ research publications in international peer-reviewed journals and conferences. More on Dr Ojha’s work is available at ojhavk.github.io.
Simpler models are better in their generalization. This research presents a class of neural-
inspired algorithms that are highly sparse in their architectural construction but perform
highly accurately. In addition, they make a simultaneous function approximation and
feature selection when solving machine learning tasks: classification, regression, and pattern
recognition. This class of algorithms are Neural Tee Algorithms: Heterogeneous Neural Tree,
Multi-Output Neural Tree, and Backpropagation Neural Tree. This research found that any
such arbitrarily constructed neural tree, which is like an arbitrarily “thinned” neural
network, has the potential to solve machine learning tasks with an equivalent or better
degree of accuracy than a fully connected symmetric and systematic neural network
architecture. The algorithm takes random repeated inputs through its leaves and imposes
dendritic nonlinearities through its internal connections like a biological dendritic tree
would do. The algorithm produces an ad hoc neural tree which is trained using a stochastic
gradient descent optimizer. The algorithms produce high-performing and parsimonious
models balancing the complexity with descriptive ability on a wide variety of machine
Ojha, V., & Nicosia, G. (2022). Backpropagation neural tree. Neural Networks, 149, 66-
Ojha, V., & Nicosia, G. (2020). Multiobjective optimization of multi-output neural trees. In 2020 IEEE Congress on Evolutionary Computation (CEC) (pp. 1-8). IEEE
Sensitivity analysis offers the opportunity to explore the sensitivity (influence) of
parameters on a model. This work applies global sensitivity analysis to deep learning and
optimization algorithms for the analysis of the influence of their hyperparameters. For deep
learning, we analyzed hyperparameters such as type of optimizers, learning rate, batch size,
etc. We analyzed these hyperparameters for deep neural networks such as ResNet18,
AlexNet, and GoogleNet. For the optimization algorithms, we analyzed hyperparameters of
two single-objective and two multi-objective state-of-the-art global optimization
evolutionary algorithms as an algorithm configuration problem. We investigate the quality
of influence hyperparameters have on the performance of algorithms in terms of their
direct effect and interaction effect with other hyperparameters. Using three sensitivity
analysis methods, Morris LHS, Morris, and Sobol, to systematically analyze tuneable
hyperparameters, the framework reveals the behaviours of hyperparameters to sampling
methods and performance metrics. That is, it answers questions like what hyperparameters
influence patterns, how they interact, how much they interact, and how much their direct
influence is. Consequently, the ranking of hyperparameters suggests their order of tuning,
and the pattern of influence reveals the stability of the algorithms.
Assessing Ranking and Effectiveness of Evolutionary Algorithm Hyperparameters Using
Global Sensitivity Analysis Methodologies, Swarm and Evolutionary
Sensitivity Analysis for Deep Learning: Ranking Hyper-parameter