Home News & Events Research Seminars [Research Seminar] Foundation of Mixture of Experts in Complex and Massive AI Models

[Research Seminar] Foundation of Mixture of Experts in Complex and Massive AI Models

About the Talk:

Since the release of the original Transformer model, extensive efforts have been devoted to scaling up the model complexities to take advantage of massive datasets and advanced computing resources. To go beyond simply increasing the network depth and width, Sparse Mixture-of-experts (SMoE) has risen as an appealing solution for scaling Large Language Models. By modularizing the network and activating only subsets of experts per input, SMoE offers constant computational costs while scaling up the model complexity, which often results in improved performances. Despite the initial success, effective SMoE training has been well-known to be notoriously challenging because of the representation collapse issue where all experts converge to learn similar representations or all tokens are only routed to a few experts. As a result, SMoE often suffers from limited representation capabilities and wasteful parameter usage. In this talk, to address its core challenge of representation collapse, we propose a novel competition mechanism for training SMoE, which enjoys the same convergence rate as the optimal estimator in hindsight. Second, we develop CompeteSMoE, a scalable and effective training strategy for SMoE training via competition. CompeteSMoE employs a router trained to predict the competition outcome in a scheduled manner. Thus, the router can learn high quality routing policy that are relevant to the current task. Lastly, we conduct extensive experiments to demonstrate strong learning capabilities of CompeteSMoE and show its promising scalability to large scale architectures.

In the second part of the talk, we introduce a novel mixture-of-experts (MoE) framework, which we call FuseMoE, for handling a variable number of input modalities, which has remained an open challenge in multimodal fusion due to challenges with scalability and lack of a unified approaches for addressing missing modalities. FuseMoE incorporates sparsely gated MoE layers in its fusion component, which are adept at managing distinct tasks and learning optimal modality partitioning. In addition, FuseMoE surpasses previous transformer-based methods in scalability, accommodating an unlimited array of input modalities. Furthermore, FuseMoE routes each modality to designated experts that specialize in those specific data types. This allows FuseMoE to adeptly handle scenarios with missing modalities by dynamically adjusting the influence of experts primarily responsible for the absent data, while still utilizing the available modalities. Lastly, another key innovation in FuseMoE is the integration of a novel Laplace gating function, which not only theoretically ensures better convergence rates compared to Softmax functions, but also demonstrates better predictive performance. We demonstrate that our approach shows superior ability, as compared to existing methods, to integrate diverse input modality types with varying missingness and irregular sampling on three challenging ICU prediction tasks.

About the Speaker: 

Nhat Ho is currently an Assistant Professor of Data Science and Statistics at the University of Texas at Austin. He is a core member of the University of Texas Austin Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning. He is currently associate editor and area chair of several prestigious journals and conferences in machine learning and statistics. His current research focuses on the interplay of four principles of machine learning and data science: interpretability of models (deep generative models, convolutional neural networks, Transformer, model misspecification), stability, and scalability of optimization and sampling algorithms (computational optimal transport, (non)-convex optimization in statistical settings, sampling and variational inference, federated learning), and heterogeneity of data (Bayesian nonparametrics, mixture and hierarchical models).

Recent Events

A Dialogue with the 2024 VinFuture Prize Laureates
8:00 - 11:30, December 07, 2024

A Dialogue with the 2024 VinFuture Prize Laureates

Event date: December 7th, 2024 The VinFuture Prize, an annual international Science and Technology award organized by the VinFuture Foundation, celebrates groundbreaking scientific research and technological innovations. Following the 2024 VinFuture Prize Award Ceremony, this year’s Laureates will participate in an inspirational public dialogue on December 7th, 2024, at VinUniversity. Under the theme “Resilient Rebound”, […]

Research Seminar: Distribution of Scientific Project Funds Using Big-Data and Operations Research Methods
11:00 - 12:00, November 22, 2024

Research Seminar: Distribution of Scientific Project Funds Using Big-Data and Operations Research Methods

DETAILS: Date: Friday, November 22, 2024 Time: 11:00 AM – 12:00 PM Location: I201 Peace Room, VinUniversity AGENDA: Research Seminar Discussion Opportunities for Further Studies at Tsinghua (Thanh Hoa) University * Pizzas will be provided for a short networking lunch session after the seminar. ABOUT THE SEMINAR: This talk explores the use of operations research and big-data methods […]

SECOM 2024: Advancing Electrochemical and Organic Materials
08:30 - 17:30, November 23, 2024

SECOM 2024: Advancing Electrochemical and Organic Materials

Dear Colleagues and Students, We are excited to invite you to the Symposium on Electrochemical and Organic Materials – SECOM 2024 on Materials and Devices for Green Energy and Electronics, taking place on November 23, 2024 at VinUniversity. Electrochemical and Organic Semiconducting devices are providing new and innovative solutions for various challenges facing society, such as increasing demand for low-cost sustainable […]