Click and Buzz

While the prototype voice synthesis using spectral envelopes is encouraging, there are still a lot of issues left to resolve.

I tracked down a “click” in the output to sudden transitions in amplitude between frames. Amplitude and frequency are now interpolated over 128 samples, so the transition is smooth.

I was getting some very wrong results with some of the morphs, which I eventually tracked down to a number of bugs in the warping code.

I’ve also added some linear interpolation to the spectral envelope amplitude estimates. If it made any difference, I can’t hear it. But I’ll leave it in anyway.

The output is also still quite “buzzy” and robotic. I’m hoping that adding some jitter to the fundamental frequency will help take care of that. I tried averaging the spectral envelope to smooth it, but that just made the output sound muffled.

I’d experimented with resetting the phase of all the generators back to zero at the start of the pulse (synchronized with the F0), which had the result of making each wave pulse essentially symmetric. That code got tossed out.

The current plan is to continue adding the vowel phonemes and look for issues, and then start working on voiced consonants and see how well they work.


About synsinger

Developer and Musician
