导航
学术报告|
当前位置:首页  科学研究  学术报告
报告题目: What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)
报 告 人: 滕佳烨
报告人所在单位: 上海财经大学
报告日期: 2025-11-13
报告时间: 14:00-15:00
报告地点: 光华东主楼1513
   
报告摘要:

 While looped transformers (termed as Looped-Attn) often outperform standard transformers (termed as Single-Attn) on complex reasoning tasks, the theoretical basis for this advantage remains underexplored. In this paper, we explain this phenomenon through the lens of loss landscape geometry, inspired by empirical observations of their distinct dynamics at both sample and Hessian levels. To formalize this, we extend the River-Valley landscape model by distinguishing between U-shaped valleys (flat) and V-shaped valleys (steep). Based on empirical observations, we conjecture that the recursive architecture of Looped-Attn induces a landscape-level inductive bias towards River-V-Valley. Theoretical derivations based on this inductive bias guarantee a better loss convergence along the river due to valley hopping, and further encourage learning about complex patterns compared to the River-U-Valley induced by Single-Attn. Building on this insight, we propose SHIFT (Staged HIerarchical Framework for Progressive Training), a staged training framework that accelerates the training process of Looped-Attn while achieving comparable performances.

滕佳烨20251113(3).pdf

   
本年度学院报告总序号: 1282

Copyright © |2012 复旦大学数学科学学院版权所有 沪ICP备042465