Lei Xie

Professor · Director, Audio, Speech and Language Processing Lab (ASLP)

lxie_pic.jpeg

Lei Xie is a Professor at Northwestern Polytechnical University, where he leads the Audio, Speech and Language Processing Lab (ASLP@NPU). His research focuses on speech processing, conversational AI, and neural models for speech and language technologies, with work spanning speech enhancement, automatic speech recognition, and speech synthesis.

He is also committed to building open-source tools and data resources for the research community, including the widely used WeNet toolkit and the WenetSpeech open-data series.

Professor Xie has published over 400 papers, received more than 17,000 Google Scholar citations, and has an H-index of 62. His work has received multiple best paper awards, won international challenge championships, and has been translated into industrial applications. He currently serves as Vice Chairperson of ISCA SIG-CSLP and Senior Area Editor for IEEE/ACM TASLP and IEEE SPL.

Email: lxie@nwpu.edu.cn
Address: Room 207, School of Computer Science, Changan Campus, Northwestern Polytechnical University, 710129, Changan Discrict, Xian, China
Full Biography

Lei Xie is a Professor at the School of Computer Science, Northwestern Polytechnical University (NPU), where he leads the Audio, Speech and Language Processing Lab (ASLP@NPU). His research focuses on speech processing, conversational AI, advanced neural models for speech and language technologies and large audio/speech language models, with contributions spanning speech enhancement, automatic speech recognition, speech synthesis and spoken dialogue systems.

He is also committed to advancing open-source research infrastructure for the community, leading projects such as the widely used WeNet speech recognition toolkit and the WenetSpeech open-data series.

Dr. Xie received his Ph.D. in Computer Engineering from NPU, where his doctoral research focused on speech recognition. Before joining NPU as a faculty member, he held research positions at Vrije Universiteit Brussel, City University of Hong Kong, and The Chinese University of Hong Kong.

He has received several honors and recognitions, including the New Century Excellent Talents Program of the Ministry of Education of China, the Shaanxi Young Science and Technology Star Award, recognition as one of the World’s Top 2% Scientists (Stanford University & Elsevier), and the title of Huawei Cloud AI Distinguished Teacher.

Professor Xie has published over 400 peer-reviewed papers in audio, speech, and language processing, with more than 17,000 citations on Google Scholar and an H-index of 62. His work has received multiple best paper awards at international conferences and won several international challenge championships. A number of his research outcomes have also been successfully translated into real-world industrial applications.

At ASLP@NPU, he mentors a diverse group of students and researchers working at the intersection of speech, audio, and language intelligence. He is also an active contributor to the research community, serving in leadership and editorial roles. He currently serves as Vice Chairperson of the ISCA Special Interest Group on Chinese Spoken Language Processing (SIG-CSLP) and as Senior Area Editor for both IEEE/ACM Transactions on Audio, Speech, and Language Processing and IEEE Signal Processing Letters.


News

Apr 10, 2026 We are pleased to announce the IEEE SLT 2026 SmartGlasses Challenge on Egocentric Speech Interaction for AI Glasses.
Apr 10, 2026 The 2026 master’s cohort graduated successfully and joined top companies such as Alibaba, Tencent, and JD.com. Congratulations!
Apr 07, 2026 WenetSpeech-Wu - The largest Wu Chinese dataset to date, accepted by ACL2026.
Apr 07, 2026 LLM-forced Aligner, the technology behind Qwen3-Qwen/Qwen3-ForcedAligner, accepted by ACL2026
Apr 05, 2026 Congratulations to Dr. Jijun Yao on receiving the Tencent Qingyun Program offer and joining Tencent!
Mar 17, 2026 4 papers accepted by ICME2026
Jan 18, 2026 8 papers accepted by ICASSP2026
Jan 08, 2026 VoiceSculptor, a voice design model, now open-sourced

Lab

The Audio, Speech and Language Processing Lab (ASLP@NPU), led by Prof. Lei Xie at Northwestern Polytechnical University, is widely recognized as one of the leading research groups in speech, audio, and language technologies. The lab conducts cutting-edge research spanning speech recognition, speech synthesis, speech enhancement, spoken dialogue systems, and emerging audio language models, with a strong commitment to both scientific innovation and real-world impact.

ASLP@NPU places equal emphasis on research excellence and practical deployment, and has maintained close and long-term collaborations with industry. Many of its research outcomes have been successfully translated into real applications, while its open-source platforms and data resources — including WeNet and WenetSpeech — have been widely adopted by both academia and industry.

The lab has also played an important role in cultivating talent for the broader AI and speech community, with many alumni becoming technical leaders, senior researchers, and key engineering contributors in leading technology companies and research institutions.

By combining academic depth, engineering strength, and industrial relevance, ASLP@NPU continues to advance the frontier of speech intelligence and next-generation human–machine communication.

Recent Popular Open-source Projects
  • SoulX-Podcast — Inference codebase for generating high-fidelity podcasts from text with multi-speaker multi-dialect support
  • DiffRhythm — End-to-end full-length song generation via latent diffusion
  • OSUM — Open speech understanding model for limited academic resources
  • SongEval — Aesthetic evaluation toolkit for generated songs
  • WenetSpeech-Yue — Large-scale Cantonese speech corpus with multi-dimensional annotation
  • MeanVC — Lightweight and streaming zero-shot voice conversion via mean flows
  • VoiceSculptor — Instruct text-to-speech solution based on LLaSA and CosyVoice2
  • WenetSpeech-Chuan — Large-scale Sichuanese dialect speech corpus
  • DiffRhythm2 — Efficient high-fidelity song generation via block flow matching
  • WenetSpeech-Wu-Repo — Large-scale Wu dialect speech corpus with multi-dimensional annotation
  • SongFormer — Ultra-Fast, Ultra-Accurate Music Structure Analysis Tool

Recent Publications

Collaborators

  1. ICASSP
    Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
    Bingshen Mu, Pengcheng Guo, Zhaokai Sun, Shuai Wang, Hexin Liu, Mingchen Shao, and 5 more authors
    In ICASSP, 2026
  2. ICASSP
    WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing
    Yuhang Dai, Ziyu Zhang, Shuai Wang, Longhao Li, Zhao Guo, Tianlun Zuo, and 10 more authors
    In ICASSP, 2026
  3. ICASSP
    Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
    Mingchen Shao, Bingshen Mu, Chengyou Wang, Hai Li, Ying Yan, Zhonghua Fu, and 1 more author
    In ICASSP, 2026
  4. ICASSP
    MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
    Guobin Ma, Jixun Yao, Ziqian Ning, Yuepeng Jiang, Lingxin Xiong, Lei Xie, and 1 more author
    In ICASSP, 2026
  5. ICASSP
    S²Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion
    Ziqian Wang, Xianjun Xia, Chuanzeng Huang, and Lei Xie
    In ICASSP, 2026
  6. ICASSP
    The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
    Guobin Ma, Yuxuan Xia, Jixun Yao, Huixin Xue, Hexin Liu, Shuai Wang, and 2 more authors
    In ICASSP, 2026
  7. ICASSP
    The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era
    Zhixian Zhao, Shuiyuan Wang, Guojian Li, Hongfei Xue, Chengyou Wang, Shuai Wang, and 10 more authors
    In ICASSP, 2026
  8. ICASSP
    Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
    Guojian Li, Chengyou Wang, Hongfei Xue, Shuiyuan Wang, Dehui Gao, Zihan Zhang, and 5 more authors
    In ICASSP, 2026
  9. AAAI
    KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
    Kangxiang Xia, Xinfa Zhu, Jixun Yao, Wenjie Tian, Wenhao Li, and Lei Xie
    In AAAI, 2026
  10. AAAI
    Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR
    Bingshen Mu, Hexin Liu, Hongfei Xue, Kun Wei, and Lei Xie
    In AAAI, 2026
  11. AAAI
    WenetSpeech-Yue: A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation
    Longhao Li, Zhao Guo, Hongjie Chen, Yuhang Dai, Ziyu Zhang, Hongfei Xue, and 12 more authors
    In AAAI, 2026
  12. TASLP
    Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
    Zhixian Zhao, Xinfa Zhu, Xinsheng Wang, Shuiyuan Wang, Xuelong Geng, Wenjie Tian, and 1 more author
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  13. TASLP
    FPO: Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
    Jixun Yao, Yuguang Yang, Yu Pan, Yuan Feng, Ziqian Ning, Jianhao Ye, and 2 more authors
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  14. ASRU
    EchoFree: Towards Ultra Lightweight and Efficient Neural Acoustic Echo Cancellation
    Xingchen Li, Boyi Kang, Ziqian Wang, Zihan Zhang, Mingshuai Liu, Zhonghua Fu, and 1 more author
    In ASRU, 2025
  15. ASRU
    XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation
    Tianlun Zuo, Jingbin Hu, Yuke Li, Xinfa Zhu, Hai Li, Ying Yan, and 3 more authors
    In ASRU, 2025
  16. ASRU
    Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
    Wenjie Tian, Xinfa Zhu, Hanke Xie, Zhen Ye, Wei Xue, and Lei Xie
    In ASRU, 2025
  17. ASRU
    Efficient Scaling for LLM-based ASR
    Bingshen Mu, Yiwen Shao, Kun Wei, Dong Yu, and Lei Xie
    In ASRU, 2025
  18. ASRU
    DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
    Huakang Chen, Yuepeng Jiang, Guobin Ma, Chunbo Hao, Shuai Wang, Jixun Yao, and 4 more authors
    In ASRU, 2025
  19. ASRU
    REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
    Yuepeng Jiang, Ziqian Ning, Shuai Wang, Chengjia Wang, Mengxiao Bi, Pengcheng Zhu, and 2 more authors
    In ASRU, 2025
  20. ACM MM
    Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning
    Hongfei Xue, Yufeng Tang, Hexin Liu, Jun Zhang, Xuelong Geng, and Lei Xie
    In ACM MM, 2025
Full Publications →

Professional Services


Awards

  • 3rd Place, Single Track, Interspeech 2026 Audio Reasoning Challenge
  • 1st Place, In-Domain Singing Style Conversion Track, ASRU 2025 The Singing Voice Conversion Challenge
  • 1st Place, Zero-Shot Singing Style Conversion Track, ASRU 2025 The Singing Voice Conversion Challenge
  • 1st Place, General Audio Source Separation Track, NCMMSC 2025 CCF Advanced Audio Technology Competition
  • 2nd Place, Target Speaker Lipreading Track, ICME 2024 Chat-scenario Chinese Lipreading (ChatCLR) Challenge
  • 1st Place, Source Speaker Verification Against Voice Conversion Track, SLT 2024 Source Speaker Tracing Challenge(SSTC)
  • 1st Place, ICASSP 2024 Packet Loss Concealment (PLC) Challenge
  • 2nd Place, Real-time Track, ICASSP 2024 Speech Signal Improvement Challenge
  • 3rd Place, Non-real-time Track, ICASSP 2024 Speech Signal Improvement Challenge
  • 2nd Place, ICASSP 2024 Multimodal Information based Speech Processing (MISP) Challenge
  • 1st Place, 2024 Shenghua Cup Acoustic Technology Competition
  • 1st Place, Single-Speaker VSR Track, NCMMSC 2024 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, Multi-Speaker VSR Track, NCMMSC 2024 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge(LRDWWS Challenge)
  • 1st Place, Speech-to-Speech Translation (Offline) Track, ACL 2023 Speech-to-Speech Translation (S2ST)
  • 1st Place, Any-to-one, In-domain Singing Voice Conversion Track, ASRU 2023 The Singing Voice Conversion Challenge
  • 2nd Place, Any-to-one, Cross-domain Singing Voice Conversion Track, ASRU 2023 The Singing Voice Conversion Challenge
  • 2nd Place, Audio-Visual Target Speaker Extraction (AVTSE) Track, ICASSP 2023 Multi-modal Information based Speech Processing (MISP) Challenge
  • 1st Place, UDASE (Unsupervised Domain Adaptation for Speech Enhancement) Track, Interspeech 2023 CHiME Speech Separation and Recognition Challenge (CHiME-7)
  • 1st Place, Non-personalized AEC Track, ICASSP 2023 Acoustic Echo Cancellation Challenge (AEC Challenge)
  • 2nd Place, Personalized AEC Track, ICASSP 2023 Acoustic Echo Cancellation Challenge (AEC Challenge)
  • 2nd Place, Audio-Visual Diarization & Recognition Track, ICASSP 2023 Multimodal Information based Speech Processing (MISP) - Challenge
  • 3rd Place, Audio-Visual Speaker Diarization Track, ICASSP 2023 Multimodal Information based Speech Processing (MISP) Challenge
  • 1st Place, Headset Speech Enhancement Track, ICASSP 2023 Deep Noise Suppression Challenge
  • 1st Place, Speakerphone Speech Enhancement Track, ICASSP 2023 Deep Noise Suppression Challenge
  • 1st Place, Speech Enhancement Track, 2023 Shenghua Cup Acoustic Technology Competition
  • 1st Place, ASRU 2023 MultiLingual Speech processing Universal PERformance Benchmark (SUPERB)
  • 1st Place, Single-Speaker VSR Track, NCMMSC 2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, Multi-Speaker VSR Track, NCMMSC 2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC)
  • 1st Place, Speaker Anonymization Track, Interspeech 2022 VoicePrivacy 2022 Challenge (VPC 2022)
  • 2nd Place, Fully-supervised Track, Interspeech 2022 Far-field Speaker Verification Challenge (FFSVC)
  • 2nd Place, Semi-supervised Track, Interspeech 2022 Far-field Speaker Verification Challenge (FFSVC)
  • 2nd Place, ISCSLP 2022 Magichub Code-Switching ASR Challenge
  • 3rd Place, ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge
  • 1st Place, Constrained Track, O-COCOSDA 2022 Indic Multilingual Speaker Verification Challenge (I-MSV)
  • 3rd Place, Unconstrained Track, O-COCOSDA 2022 Indic Multilingual Speaker Verification Challenge (I-MSV)
  • 3rd Place, NCMMSC 2022 Low-resource Mongolian Text-to-Speech Challenge
  • 2nd Place, Training with VoxCeleb 1/2 Only Track, VoxSRC 2021 Workshop 2021 VoxCeleb Speaker Recognition Challenge (VoxSRC)
  • 2nd Place, Additional Public Data Allowed (e.g., MUSAN, RIR) Track, VoxSRC 2021 Workshop 2021 VoxCeleb Speaker Recognition - Challenge (VoxSRC)
  • 3rd Place, Real-Time Wideband Speech Enhancement Track, Interspeech 2021 Deep Noise Suppression Challenge (DNS Challenge)
  • 3rd Place, Real-Time AEC & Speech Enhancement Track, Interspeech 2021 Acoustic Echo Cancellation Challenge (AEC Challenge)
  • 1st Place, Close-talking Single-channel Track, ISCSLP 2021 Personalized Voice Trigger Challenge (PVTC)
  • 1st Place, Real-Time Wideband Speech Enhancement Track, Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge)
  • 2nd Place, Non-Real-Time Wideband Speech Enhancement Track, Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge)
  • 1st Place, Closed-set Word-level Audio-Visual Speech Recognition Track, ICMI 2019 Mandarin Audio-Visual Speech Recognition - Challenge
  • 3rd Place, Interspeech 2018 CHiME Speech Separation and Recognition Challenge (CHiME-5)
  • 2nd Place, Unsupervised Subword Unit Modeling Track, Interspeech 2017 Zero Resource Speech Challenge
  • 1st Place, Spoken Term Discovery Track, Interspeech 2015 Zero Resource Speech Challenge
  • 1st Place, QUESST (Query-by-Example Speech Search) Track, MediaEval Multimedia Benchmark Workshop 2015 Query-by-Example Search on Speech Task (QUESST)
  • 2nd Place, QUESST (Query-by-Example Speech Search) Track, MediaEval Multimedia Benchmark Workshop 2014 Query-by-Example Search on Speech Task (QUESST)