Spectral Synthesis Revisited

I haven’t had a chance to get much coding done over the last couple months. However, I’ve been doing a lot of reading on various vocal synthesis technologies.

I’d read quite a bit about spectral modeling synthesis (SMS) before, and decided to put together some quick tests.

Essentially, SMS uses the short-term FFT to capture a “spectral envelope” – a graph of what the amplitude of each harmonic is at each frequency.

Getting this information in Praat– there’s an option to get a “spectral slice” that lists the amplitude (in decibels) at given frequencies:

freq(Hz)    pow(dB/Hz)
0    -17.171071546036153
21.533203125    20.388397279697497
43.06640625    26.97257113598713
64.599609375    30.64585953310637
86.1328125    32.25338096741935
107.666015625    31.708646066090367
129.19921875    29.056762741248782
150.732421875    28.923892178490085
172.265625    32.719805559695565
... and so on

Converting the decibels to a linear value looks like this:

function dbToLinear( x )
  return math.pow(10, x/20)  

To convert the spectral envelope back into sound, the process is reversed. You can generate a bunch of sine waves that are multiples of the fundamental frequency, and use the spectral envelope to look up the amplitude of each frequency.

Or if you’re clever, you can do an inverse FFT and stitch the frames together.

I’m exploring the idea of using a single spectral slice to represent each phoneme target, and morphing from one target to the next. For the morphs to work, key features need to be specified. Conveniently, this corresponds to peaks at the formant frequencies – something that Praat also calculates.

In initial tests of the morphs, the formants seem to move fairly naturally.

However, using a single spectral slice to represent the sound creates a mechanical sounding voice, much like a door buzzer. Altering the amplitude and fundamental frequency may help solve that problem.

SMS generally also models the residual of the voice – the part of the sound that’s not represented by the harmonic portion. One option I’m looking into is using formant synthesis to generate the residual portion by passing white noise through filters.


About synsinger

Developer and Musician
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

5 Responses to Spectral Synthesis Revisited

  1. Daniel C says:

    Spectral morphing is tricky, I tried building something like this once and never could get it right. To morph between two spectra you basically need to map out the frequency weightings on a grid, with one spectrum on the X axis and one spectrum on the Y axis, and then walk from the one corner to the opposite corner to find a path of best fit (you can use Dynamic Programming or pathfinding or whatever in order to find this path). I found a paper that describes the process very well if you can wait until next week when I can dig it out again.

    I have a half-finished iOS app that does spectral synthesis on a range of phonemes (no morphing, just fading), let me know if you’d like to try it out?

    • synsinger says:

      When I started out, I was more focused on an UTAU sort of approach, and tried to get spectral morphing to work when cross-fading samples. I never really had a lot of success with it.
      My approach is obviously not nearly as sophisticated as what you’re suggesting. Still, I’d love to see the paper if you can run across it.
      I’ll pass on the iOS application, though… mostly because I don’t have any iOS devices handy!

    • synsinger says:

      Sorry for the really, really late reply on this, but if you could point me to the paper, I’d appreciate it!

      • Daniel C says:

        Whoa, how did I miss this the first time around – sorry about that!

        Here’s the two papers I was referring to at the time:

        I think if you’re into this stuff, if you’re not on there already you should probably join the AudioKit Slack group – even if you’re intending to use your own systems other than AudioKit, there’s lots of very good people (e.g. I saw you were looking at Paul Batchelor’s stuff earlier) involved with DSP on there. Let me know if you’d like me to sort an invite?

      • synsinger says:

        Thanks, Daniel! I may eventually take you up on that offer.

        I’m really more a hobbyist, and my lack of a solid background can lead to some rather bone-headed mistakes. But For the moment, I think I *need* to re-invent the wheel until I finally understand how this stuff works, or why something I thought was a good idea really doesn’t work in practice.

        I’m going to have one more go at spectral morphing. Keep in mind that I’m trying to use the spectrum of a single wave as the basis of each target, so the task is a bit easier.

        Thanks again!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s