Late last year, we sped up Whisper (https://github.com/Vaibhavs10/insanely-fast-whisper) by 10x. It involved a lot of hacking (like flash attention, batching, torch.compile, etc) and tricks that can be implemented in other models, too. Join me as I walk through those optimisations and tricks.
In this talk, we’ll brush upon optimisations that can help you speed up your PyTorch models. We’ll primarily look at the following optimisations:
By the end of this talk you’d have learnt more about how to make your existing LLMs/ PyTorch models faster and better.
I’m an ML Developer Advocate at Hugging Face focusing on extracting better insights from Text + Audio.
I’ve been a freelancer, tax analyst, consultant, tech speaker and advisor for five years. In the past three years, I’ve invested significant time volunteering for open source and science organisations like Hugging Face, EuroPython, PyCons across APAC, Google Cloud Delhi, and Facebook Developer Circles.
Delhi native, I now live in Germany!