One of the tricky bits has been cramming a lot of phonemes into a tiny duration. Something obviously has to go.
My initial approach was to scale everything equally… But that didn’t work well with sampled consonants.
So I tried different approaches to preserve the duration on sampled consonants, and focused on shrinking voiced phonemes. This worked better, but still has issues.
I’d kicked around a number of ideas on how to compress the samples, when it finally occurred to me to simply trim the end off the sampled consonants. That turned out to work remarkably well.