AudioRAG Demo - Semantic Audio Search

This demo builds on the work from the ColQwen team, expanding retrieval capabilities beyond images to include audio and video.

Unlike traditional methods, this model searches directly through raw audio without converting it to text. It understands semantic meaning in sound, speech, and audio patterns, making "AudioRAG" a real possibility.

📖 Blog post | 🤗 Model on Hugging Face | 📓 Colab Notebook | 🎙️ Sample from Newsroom Robots

Examples

Upload Audio File	Search Query	Chunk Length (seconds)