/

En

主题报告演讲者


  • 朱松纯 教授
  • 北京通用人工智能研究院院长、北京大学讲席教授、清华大学基础科学讲席教授
  • 简介: 朱松纯教授出生于湖北省鄂州,全球著名计算机视觉专家,统计与应用数学家、人工智能专家。 1991年毕业于中国科技大学,1992年赴美留学,1996年获得美国哈佛大学计算机博士学位。2002年至2020年,在美国加州大学洛杉矶分校(UCLA)担任统计系与计算机系教授,UCLA视觉、认知、学习与自主机器人中心主任。在国际顶级期刊和会议上发表论文300余篇,获得计算机视觉、模式识别、认知科学领域多个国际奖项,包括3次问鼎计算机视觉领域国际最高奖项--马尔奖,赫尔姆霍茨奖等,2次担任国际计算机视觉与模式识别大会主席(CVPR2012、CVPR2019),2010-2020年 2次担任美国视觉、认知科学、人工智能等领域多大学、跨学科合作项目MURI负责人。朱松纯教授长期致力于构建计算机视觉、认知科学、乃至人工智能科学的统一数理框架。在留美28年后, 朱教授于2020年9月回国,担任北京通用人工智能研究院院长,并任北京大学讲席教授、清华大学基础科学讲席教授。

  • 报告题目:Computer Vision: A Task-oriented and Agent-based Perspective

  • 摘要: In the past 40+ years, computer vision has been studied from two popular perspectives: i) Geometry-based and object centered representations in the 1970s-90s, and ii) Appearance-based and view-centered representations in the 2000-2020s. In this talk, I will argue for a third perspective: iii) agent-based and task-centered representations, which will lead to general purpose vision systems as integrated parts of AI agents. From this perspective, vision is driven by large number of daily tasks that AI agents need to perform, including searching, reconstruction, recognition, grasping, social communications, tool using, etc. Thus vision is viewed as a continuous computational process to serve these tasks. The key concepts in this perspective are physical and social representations for: functionality, physics, intentionality, causality and utility.

  • 徐波 研究员
  • 中国科学院自动化研究所所长、中国科学院大学人工智能学院院长
  • 简介: 徐波研究员现任中科院自动化研究所所长,中国科学院大学人工智能学院院长,兼任国家新一代人工智能战略咨询委员会委员等职。长期从事智能语音处理和人工智能技术研究和应用。曾获“中国科学院杰出青年奖”、“王选新闻科技进步一等奖”等奖项;主持多项国家支撑、863、973以及自然科学基金重点项目和课题。研究成果在教育、广电、安全等领域获得规模化应用。近年了重点围绕听觉模型、类脑智能、认知计算及博弈智能等进行研究探索。

  • 报告题目:三模态大模型-通用人工智能路径探索

  • 摘要: 随着文本模型GPT3/BERT等提出,预训练模型呈现高速发展的趋势,图像-文本联合学习的双模态模型也不断涌现,显示出在无监督情况下自动学习不同任务和快速迁移到不同领域数据的强大能力。然而,当前的预训练模型忽略了声音信息。在我们周边还包含大量的声音,其中语音不仅是人类之间交流的手段,还蕴藏着情绪和感情。本报告将介绍引入语音以后的首个图-文-音三模态大模型 “紫东太初”。模型将视觉、文本、语音不同模态通过各自编码器映射到统一语义空间,然后通过多头自注意力机制(Multi-head Self-attention)学习模态之间的语义关联以及特征对齐,形成多模态统一知识表示;既可以实现跨模态理解,还能实现跨模态生成,同时做到理解和生成认知能力的平衡;我们提出了一个基于词条级别(Token-level)、模态级别(Modality-level)以及样本级别(Sample-level)的多层次、多任务自监督学习统一框架,对更广泛、更多样的下游任务提供模型基础支撑,并特别地实现了通过语义网络以图生音、以音生图的功能。三模态大模型是迈向具有艺术创作能力、强大交互能力和任务泛化能力的通用型人工智能的一次重要的尝试。

  • 虞晶怡 教授
  • 上海科技大学副教务长、信息科学与技术学院执行院长
  • 简介:  虞晶怡,上海科技大学副教务长、信息学院执行院长。在加入上海科技大学前,他任职美国特拉华大学计算机与信息科学系正教授。他于2000年获美国加州理工大学应用数学及计算机学士学位, 2003年获美国麻省理工大学计算机与电子工程硕士学位,2005年获美国麻省理工大学计算机与电子工程博士学位。他长期从事计算机视觉、计算成像、计算机图形学、生物信息学等领域的研究工作,已发表120多篇学术论文, 其中超70篇发表于国际会议CVPR/ICCV/ECCV和期刊TPAMI。他已获得美国发明专利20余项,并于2009和2010年分别获得美国国家科学基金的杰出青年奖和美国空军研究院的杰出青年奖。他是IEEE TPAMI、IEEE TIP 和 Elsevier CVIU的编委,担任ICPR 2020,IEEE CVPR 2021,IEEE WACV 2021, 和ICCV 2025的大会程序主席。因为他在计算机视觉和计算成像上的贡献,当选IEEE Fellow。

  • 报告题目:Neural Human Reconstruction: From Rendering to Modeling

  • 摘要: Recent advances on deep learning, in particular, neural modeling and rendering, have renewed interests on developing effective 3D imaging solutions. Such techniques aim to overcome the limitations of traditional 3D reconstruction techniques such as structure-from-motion (SfM) and photometric stereo (PS) by reducing reconstruction noise, tackling texture-less regions, and synthesizing high quality free-view rendering. In this talk, I present recent efforts from my group at ShanghaiTech on neural human modeling techniques. Specifically, I demonstrate our latest neural human body reconstructor, deep 3D face synthesizer, anatomically correct 3D hand tracker, and ultra-realistic hair modeler. These solutions can produce dynamic virtual humans at an unprecedented visual quality as well as lead to profound changes to MetaVerse creation technologies. Finally I will discuss extensions of these techniques to Non-Line-of-Sight (NLOS) imaging systems for hidden object recovery.

  • Prof. Lei Zhang(张磊)
  • Dept. of Computing, The Hong Kong Polytechnic University
  • 简介: Prof. Lei Zhang joined the Department of Computing, The Hong Kong Polytechnic University, as an Assistant Professor in 2006. Since July 2017, he has been a Chair Professor in the same department. His research interests include Computer Vision, Image and Video Analysis and Pattern Recognition, etc. Prof. Zhang has published more than 200 papers in those areas. Prof. Zhang is an IEEE Fellow, a Senior Associate Editor of IEEE Trans. on Image Processing, and is/was an Associate Editor of IEEE Trans. on Pattern Analysis and Machine Intelligence, SIAM Journal of Imaging Sciences, IEEE Trans. on CSVT, and Image and Vision Computing, etc. He has been consecutively selected as a “Clarivate Analytics Highly Cited Researcher” from 2015 to 2021.

  • 报告题目:Gradient Centralization and Feature Gradient Decent for Deep Neural Network Optimization

  • 摘要: The normalization methods are very important for the effective and efficient training of deep neural networks (DNNs). Many popular normalization methods operate on weights, such as weight normalization and weight standardization. We propose a very simple yet effective DNN optimization technique, namely gradient centralization (GC), which operates on the gradients of weights directly. GC simply centralizes the gradient vectors to have zero mean. It can be easily embedded into the current gradient based optimization algorithms with just one line of code. GC demonstrates various desired properties, such as accelerating the training process, improving the generalization performance, and the compatibility for fine-tuning pre-trained models. On the other hand, existing DNN optimizers such as stochastic gradient descent (SGD) mostly perform gradient descent on weight to minimize the loss, while the final goal of DNN model learning is to obtain a good feature space for data representation. Instead of performing gradient descent on weight, we propose a method, namely feature SGD (FSGD), to approximate the output feature with one-step gradient descent for linear layers. FSGD only needs to store an additional second-order statistic matrix of input features, and use its inverse to adjust the gradient descent of weight. FSGD demonstrates much better generalization performance than SGD in classification tasks.

  • Prof. Yoichi Sato
  • University of Tokyo, Japan
  • 简介: Yoichi Sato is a professor at Institute of Industrial Science, the University of Tokyo. He received his B.S. degree from the University of Tokyo in 1990, and his MS and PhD degrees in robotics from School of Computer Science, Carnegie Mellon University in 1993 and 1997. His research interests include first-person vision, and gaze sensing and analysis, physics-based vision, and reflectance analysis. He served/is serving in several conference organization and journal editorial roles including IEEE Transactions on Pattern Analysis and Machine Intelligence, International Journal of Computer Vision, Computer Vision and Image Understanding, CVPR 2023 General Co-Chair, ICCV 2021 Program Co-Chair, ACCV 2018 General Co-Chair, ACCV 2016 Program Co-Chair and ECCV 2012 Program Co-Chair.

  • 报告题目:Understanding Human Activities from First-Person Perspectives

  • 摘要: Wearable cameras have become widely available as off-the-shelf products. First-person videos captured by wearable cameras provide close-up views of fine-grained human behavior, such as interaction with objects using hands, interaction with people, and interaction with the environment. First-person videos also provide an important clue to the intention of the person wearing the camera, such as what they are trying to do or what they are attended to. These advantages are unique to first-person videos, which are different from videos captured by fixed cameras like surveillance cameras. As a result, they attracted increasing interest to develop various computer vision methods using first-person videos as input. On the other hand, first-person videos pose a major challenge to computer vision due to multiple factors such as continuous and often violent camera movements, a limited field of view, and rapid illumination changes. In this talk, I will talk about our attempts to develop first-person vision methods for different tasks, including action recognition, future person localization, and gaze estimation.