Hey all,
Almost as impressive as all the LLMs these days is the voice that ChatGPT uses with its emphasis and dramatic pauses and umms, etc.
I would love to integrate that with a self-hosted Llama3 engine.
Is there a project that y’all have heard of?
Oh WOW! Thanks to all who commented. Next time I get a chance I’m going to check these all out! 👍🏻 I hope others find this thread helpful too!
Regarding the TTS specifically, I remember looking into TorToiSeTTS back when this stuff was first coming out. You can generate ElevenLabs quality audio with it, but it’s insanely slow. In fact, when I was looking into it, it seemed like ElevenLabs may have been using a (much faster at the time) version of TorToiSe TTS, given the output is so similar.
According to the linked Github page, they seem to have solved the speed issues now, so it might be worth looking into. Of course, the other commenters have provided solutions that are pre-integrated into the LLM, but if you’re just looking for TTS this could be worth checking out. Also worth noting that this requires an NVIDIA GPU.
When can I get one of these voices to read an epub on my phone? I’d love to have something like that
Librera FD as your reader app: https://www.f-droid.org/en/packages/com.foobnix.pro.pdf.reader/
Sherpa Onnx as your TTS engine: https://github.com/k2-fsa/sherpa-onnxI recommend the piper TTS pretrained models, either Lessac medium or Kusal high/medium