Monday, September 29, 2014

Return of the Bard

Recently, while prowling the net for GCW Zero goodies that might be willing to work for me on the zipit, I spotted some hints about a new release of Bard the Storyteller.  That got me thinking, maybe it was about time for an iz2s build of Bard.  I'm not sure why I never got around to it.  Perhaps I was worried the performance wouldn't be up to the level of the openwrt build, and I'd be stuck with the robotic 8KHz Kal voice instead of the surly 16KHz version.  I don't know.  Anyhow, I finally decided to give it a shot.

So I applied the old openwrt-zipit Bard patches, then tweaked the battery monitor and screen blanking code for iz2s.  I also adjusted the key bindings to move the page up and down functions onto the zipit Prev and Next keys, and added the increase and reduce functions (for volume/fontsize/speed) to the period and comma keys, as well as the volume +- rocker buttons.

It seemed to work ok, but the highlighted words seemed to jump ahead of the spoken voice quite a bit, so I dug a little deeper into the code.  I noticed some floating point code sprinkled about, here and there.  Seems like an odd choice to target small underpowered devices like the Nanonote, with no native floating point, and yet control the volume with a floating point multiply for each and every audio sample.   I switched that code to use an 8-bit fixed point volume multiplier.  I also converted the idle time check code to use integer milliseconds, like the rest of the SDL code instead of floating point seconds.  I did the same for the audio sample position timing code.  That might have made the "lip sync" a tiny bit better.  But it's hard to tell.

Here's the iz2s bard and flite executables, and a libzip that I needed to build it.

bard-iz2s.zip

I'll put together a patch file with the code changes ASAP.

But first I want to do some experiments with the rest of the voices.  I built the default set of flite voices, but aside from kal they all use floating point numbers for the data.  So they're an order of magnitude slower on the zipit.  That's ok if you want a higher quality voice, but you can't use them in bard because the text processing is too slow for realtime.  It might be nifty to fixup bard to "lip sync" the text on screen with a preprocessed .wav file, but I'd rather see about converting the voices to fixed point.  I examined some of the data and so far the largest number I've seen is around 220, and all the floats have 6 digits after the decimal point.  That range of values would probably fit quite nicely in a 12.20 fixed point format, which would speed things up enormously, and might even sound ok if the math doesn't overflow.  We'll see...

By the way, this might be a good time to mention that I finally remembered to add the missing speaker setting in the /mnt/sd0/bin/setup-alsa.sh script. 

  /mnt/ffs/bin/amixer -q sset "Right Speaker Playback Invert" on

I also toggled the identical setting in /mnt/sd0/etc/asound.state.  Without  that setting the speaker isn't really loud enough to hear.  But with it, I could almost picture myself listening to internet radio via the speaker, if I forgot my headphones.  It's certainly good enough to listen to the bard speak.


Next up, maybe a beta3 of the ultra iz2s SD card image.

No comments:

Post a Comment