Best TTS and STT models that support websocket streaming without too much overhead
As the title says, im looking for a good production set of TTS and STT models that support real time websocket streaming. This is going to be initially a pipeline for speech to speech production use case in a startup but my team will eventually latently bridge these opensource into a soft end-end pipeline with deepseek in the middle. I'm currently using openai realtime for our MVP and want to transition to a open source pipeline and llm for cost effectiveness and more control/fine-tuning as well.