I am currently a 3rd year Ph.D. student at Gaoling School of Artificial Intelligence (GSAI) in Renmin University of China, supervised by Prof. Rui Yan. I focus on long-context ability and mechanistic interpretability of LLMs. I am dedicated to creating more powerful foundation models.
News Within a Year
-
2024.11: We propose a novel attention mechanism that enables negative attention weights for enhanced expressiveness.
-
2024.11: I am awarded the 2024 CIE-Tencent Doctoral Student Research Incentive Program (HunYuan Large Language Model Special Project).
-
2024.09: One paper is accepted by NeurIPS 2024.
-
2024.09: Two papers are accepted by EMNLP 2024 main conference.
-
2024.09: We propose that the development of context copying capacities in LLMs is a special grokking.
-
2024.05: I am awarded the 2024 CCF-Tencent Rhino-Bird Elite Talent Program, mentored by Ruobing Xie.
-
2024.05: Two papers are accepted by ACL 2024 main conference. One paper is accepted by ACL 2024 findings.
-
2024.04: One paper is accepted by IJCAI 2024.
-
2024.03: We thoroughly studied the mechanisms of factual recall in Transformer-based language models, and I hope you will find the exciting findings engaging!
Publications (First Author and First Co-author)
-
An Analysis and Mitigation of the Reversal Curse. Ang Lv*, Kaiyi Zhang*, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP’24). Link
-
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules. Ang Lv*, Zhuocheng Gong*, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP’24). Link
-
Mixture of In-Context Experts Enhance LLMs’ Long Context Awareness. Ang Lv*, Hongzhan Lin*, Yuhan Chen*, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan. Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS’ 24).
-
Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use. Ang Lv*, Yuhan Chen*, Ting-En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL’24). Link
-
Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning. Ang Lv*, Kaiyi Zhang*, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan. Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL ‘24 Findings). Link
-
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation. Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan. The 33th International Joint Conference on Artificial Intelligence (IJCAI’24). Link
-
DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations. Ang Lv*, Jinpeng Li*, Yuhan Chen, Gao Xing, Ji Zhang, Rui Yan. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23). Link
-
Envisioning Future from the Past: Hierarchical Duality Learning for Multi-Turn Dialogue Generation. Ang Lv*, Jinpeng Li*, Shufang Xie, Rui Yan. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23 oral). Link
-
Target-Side Input Augmentation for Sequence to Sequence Generation. Ang Lv*, Shufang Xie*, Yingce Xia, Lijun Wu, Tao Qin, Tie-Yan Liu, Rui Yan. The 10th International Conference on Learning Representations (ICLR’22). Link
Other Publications
- Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models. Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL’24). Link
Preprint Papers
-
More Expressive Attention with Negative Weights. Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Rui Yan. Arxiv, Link
-
PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead. Ang Lv*, Tao Tan*, Yining Qian*, Hongzhan Lin, Songhao Wu, Yongbo Wang, Feng Wang, Jingtong Wu, Xin Lu, Rui Yan. Arxiv. Link
-
Language Models “Grok” to Copy. Ang Lv, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Rui Yan. Arxiv. Link
-
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models. Ang Lv*, Yuhan Chen*, Kaiyi Zhang*, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan. Arxiv. Link
-
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework. Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan. Arxiv. Link
-
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation. Yuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu. Arxiv. Link
Honors and Awards
- CIE-Tencent Doctoral Student Research Incentive Program (HunYuan Large Language Model Special Project), 1 of 17 selected individuals nationwide(中国电子学会-腾讯博士生科研激励计划 混元大模型专项,全国17人)
- CCF-Tencent Rhino-Bird Elite Talent Program, 2024, 1 of 50 selected individuals nationwide(中国计算机学会-腾讯犀牛鸟精英人才计划,全国50人)
- Supported by the Outstanding Innovative Talents Cultivation Funded Programs 2023 of Renmin University of China, 2023 (中国人民大学拔尖创新人才)
Internships
- 2022.09 - 2023.03, Alibaba Damo Academy, Hangzhou.
- 2023.03 - 2023.09, Microsoft Research, Machine Learning Area, mentored by Xu Tan. Our collaborative efforts are dedicated to the Muzic project, which currently boasts 4k stars on GitHub.
- 2023.09 - 2024.05, Alibaba Damo Academy, Beijing.
- 2024.05 - now, Tencent, Beijing.