Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.15375

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2, 2025 • 70
MOSPA: Human Motion Generation Driven by Spatial Audio

Paper • 2507.11949 • Published Jul 16, 2025 • 25
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Paper • 2507.09751 • Published Jul 13, 2025 • 2
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

Awesome papers from 臺大李宏毅 (Hung-yi Lee)

Recent papers authored by Hung-yi Lee. Sorted by ID

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

Paper • 2510.16917 • Published Oct 19, 2025 • 20
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations

Paper • 2510.16893 • Published Oct 19, 2025 • 18
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Paper • 2510.08047 • Published Oct 9, 2025 • 8
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8, 2025 • 35

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published Mar 17, 2025 • 28
Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17, 2025 • 29
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

Paper • 2503.13435 • Published Mar 17, 2025 • 18
MediaTek-Research/Llama-Breeze2-8B-Instruct

8B • Updated Mar 2, 2025 • 797 • 54

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

Awesome papers from 臺大李宏毅 (Hung-yi Lee)

Recent papers authored by Hung-yi Lee. Sorted by ID

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

Paper • 2510.16917 • Published Oct 19, 2025 • 20
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations

Paper • 2510.16893 • Published Oct 19, 2025 • 18
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Paper • 2510.08047 • Published Oct 9, 2025 • 8
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8, 2025 • 35

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2, 2025 • 70
MOSPA: Human Motion Generation Driven by Spatial Audio

Paper • 2507.11949 • Published Jul 16, 2025 • 25
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Paper • 2507.09751 • Published Jul 13, 2025 • 2
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published Mar 17, 2025 • 28
Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17, 2025 • 29
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

Paper • 2503.13435 • Published Mar 17, 2025 • 18
MediaTek-Research/Llama-Breeze2-8B-Instruct

8B • Updated Mar 2, 2025 • 797 • 54

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs