Direct Synthesis

One of the problems that I’ve been encountering with the use of digital filters is a “squeal” when parameters rapidly change. Dennis Klatt published a solution for this which recalculated the stored filter coefficients, but I’ve never been able to get the code to work.

The algorithm that S.A.M. (Software Automatic Mouth) uses to generate its output doesn’t use filters. Rather, it directly generates the formants, but resets the angle of each generator back to zero at the start of each glottal pulse.

Here’s an example of a wave that’s the summation of only the formant frequencies:

formants only

Signal that is the sum of the formant frequencies

This does not sound like a vocal sound. The next step is to reset the frequency generators back at the start of each glottal pulse:

resetting formants

Resetting the formant waves at the start of each glottal pulse.

This gives the wave a “vocal” sound. However, instantly resetting the format frequency generators to zero causes discontinuities, which I’ve marked in red.

One solution is to put an amplitude envelope around the pulse, so the envelope falls to zero at the start of the pulse, at the same point the formant frequencies are reset:

formants with amplitude envelopes

Using an amplitude envelope to smooth the pulse transitions.

This waveform still has a very mechanical quality, but that can be mitigated (somewhat) by jittering the parameters.

Another factor that helps realism is to add vocal noise. This is typically generated by running noise through filters set to to the formant frequencies, and mixing that in with the vocal signal.

But it got me wondering whether the filtered noise could also be directly generated.

I’ve got a method that works fairly well, but I’m sure improvements could be made to it. The basic idea is very much like that of the example above – the formants are generated directly, for a pulse of random length. At the start of the next pulse, the starting angle for each formant frequency generator is set to a random value.

If the duration of the pulse is too long, the output sounds like a signal interrupted by noise. If the pulse is too long, the output sound like plain noise. But there’s a “sweet spot” between the two where the output resembles pitched noise. Setting the formants frequency to vowels gives the sound of a “whispered” vowel.

I haven’t had time to generate consonants using this method.



About synsinger

Developer and Musician
This entry was posted in Development and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s