• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# libsonic Home Page
2
3[Download the latest tar-ball from here](download).
4
5The source code repository can be cloned using git:
6
7    $ git clone git://github.com/waywardgeek/sonic.git
8
9The source code for the Android version, sonic-ndk, can be cloned with:
10
11    $ git clone git://github.com/waywardgeek/sonic-ndk.git
12
13There is a simple test app for android that demos capabilities.  You can
14[install the Android application from here](Sonic-NDK.apk)
15
16There is a new native Java port, which is very fast!  Checkout Sonic.java and
17Main.java in the latest tar-ball, or get the code from git.
18
19## Overview
20
21Sonic is free software for speeding up or slowing down speech.  While similar to
22other algorithms that came before, Sonic is optimized for speed ups of over 2X.
23There is a simple sonic library in ANSI C, and one in pure Java.  Both are
24designed to easily be integrated into streaming voice applications, like TTS
25back ends.  While a very new project, it is already integrated into:
26
27- espeak
28- Debian Sid as package libsonic
29- Android Astro Player Nova
30- Android Osplayer
31- Multiple closed source TTS engines
32
33The primary motivation behind sonic is to enable the blind and visually impaired
34to improve their productivity with free software speech engines, like espeak.
35Sonic can also be used by the sighted.  For example, sonic can improve the
36experience of listening to an audio book on an Android phone.
37
38Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved.  It is released
39as under the Apache 2.0 license.  Feel free to contact me at
40<waywardgeek@gmail.com>.  One user was concerned about patents.  I believe the
41sonic algorithms do not violate any patents, as most of it is very old, based
42on [PICOLA](http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
43the new part, for greater than 2X speed up, is clearly a capability most
44developers ignore, and would not bother to patent.
45
46## Comparison to Other Solutions
47
48In short, Sonic is better for speech, while WSOLA is better for music.
49
50A popular alternative is SoundTouch.  SoundTouch uses WSOLA, an algorithm
51optimized for changing the tempo of music.  No WSOLA based program performs well
52for speech (contrary to the inventor's estimate of WSOLA).  Listen to [this
53soundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
54it to [this sonic sample](sonic.wav).  Both are sped up by 2X.  WSOLA
55introduces unacceptable levels of distortion, making speech impossible to
56understand at high speed (over 2.5X) by blind speed listeners.
57
58However, there are decent free software algorithms for speeding up speech.  They
59are all in the TD-PSOLA family.  For speech rates below 2X, sonic uses PICOLA,
60which I find to be the best algorithm available.  A slightly buggy
61implementation of PICOLA is available in the spandsp library.  I find the one in
62RockBox quite good, though it's limited to 2X speed up.  So far as I know, only
63sonic is optimized for speed factors needed by the blind, up to 6X.
64
65Sonic does all of it's CPU intensive work with integer math, and works well on
66ARM CPUs without FPUs.  It supports multiple channels (stereo), and is also able
67to change the pitch of a voice.  It works well in streaming audio applications,
68and can deal with sound streams in 16-bit signed integer, 32-bit floating point,
69or 8-bit unsigned formats.  The source code is in plain ANSI C.  In short, it's
70production ready.
71
72## Using libsonic in your program
73
74Sonic is still a new library, but is in Debian Sid.  It will take a while
75for it to filter out into all the other distros.  For now, feel free to simply
76add sonic.c and sonic.h to your application (or Sonic.java), but consider
77switching to -lsonic once the library is available on your distro.
78
79The file [main.c](main.c) is the source code for the sonic command-line application.  It
80is meant to be useful as example code.  Feel free to copy directly from main.c
81into your application, as main.c is in the public domain.  Dependencies listed
82in debian/control like libsndfile are there to compile the sonic command-line
83application.  Libsonic has no external dependencies.
84
85There are basically two ways to use sonic: batch or stream mode.  The simplest
86is batch mode where you pass an entire sound sample to sonic.  All you do is
87call one function, like this:
88
89    sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);
90
91This will change the speed and pitch of the sound samples pointed to by samples,
92which should be 16-bit signed integers.  Stereo mode is supported, as
93is any arbitrary number of channels.  Samples for each channel should be
94adjacent in the input array.  Because the samples are modified in-place, be sure
95that there is room in the samples array for the speed-changed samples.  In
96general, if you are speeding up, rather than slowing down, it will be safe to
97have no extra padding.  If your sound samples are mono, and you don't want to
98scale volume or playback rate, and if you want normal pitch scaling, then call
99it like this:
100
101    sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);
102
103The other way to use libsonic is in stream mode.  This is more complex, but
104allows sonic to be inserted into a sound stream with fairly low latency.  The
105current maximum latency in sonic is 31 milliseconds, which is enough to process
106two pitch periods of voice as low as 65 Hz.  In general, the latency is equal to
107two pitch periods, which is typically closer to 20 milliseconds.
108
109To process a sound stream, you must create a sonicStream object, which contains
110all of the state used by sonic.  Sonic should be thread safe, and multiple
111sonicStream objects can be used at the same time.  You create a sonicStream
112object like this:
113
114    sonicStream stream = sonicCreateStream(sampleRate, numChannels);
115
116When you're done with a sonic stream, you can free it's memory with:
117
118    sonicDestroyStream(stream);
119
120By default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
121no change at all to the sound stream.  Sonic detects this case, and simply
122copies the input to the output to reduce CPU load.  To change the speed, pitch,
123rate, or volume, set the parameters using:
124
125    sonicSetSpeed(stream, speed);
126    sonicSetPitch(stream, pitch);
127    sonicSetRate(stream, rate);
128    sonicSetVolume(stream, volume);
129
130These four parameters are floating point numbers.  A speed of 2.0 means to
131double speed of speech.  A pitch of 0.95 means to lower the pitch by about 5%,
132and a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
133exceed the maximum range of a 16-bit integer.  Speech rate scales how fast
134speech is played.  A 2.0 value will make you sound like a chipmunk talking very
135fast.  A 0.7 value will make you sound like a giant talking slowly.
136
137By default, pitch is modified by changing the rate, and then using speed
138modification to bring the speed back to normal.  This allows for a wide range of
139pitch changes, but changing the pitch makes the speaker sound larger or smaller,
140too.  If you want to make the person sound like the same person, but talking at
141a higher or lower pitch, then enable the vocal chord emulation mode for pitch
142scaling, using:
143
144    sonicSetChordPitch(stream, 1);
145
146However, only small changes to pitch should be used in this mode, as it
147introduces significant distortion otherwise.
148
149After setting the sound parameters, you write to the stream like this:
150
151    sonicWriteShortToStream(stream, samples, numSamples);
152
153You read the sped up speech samples from sonic like this:
154
155    samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
156    if(samplesRead > 0) {
157	/* Do something with the output samples in outBuffer, like send them to
158	 * the sound device. */
159    }
160
161You may change the speed, pitch, rate, and volume parameters at any time, without
162having to flush or create a new sonic stream.
163
164When your sound stream ends, there may be several milliseconds of sound data in
165the sonic stream's buffers.  To force sonic to process those samples use:
166
167    sonicFlushStream(stream);
168
169Then, read those samples as above.  That's about all there is to using libsonic.
170There are some more functions as a convenience for the user, like
171sonicGetSpeed.  Other sound data formats are supported: signed char and float.
172If float, the sound data should be between -1.0 and 1.0.  Internally, all sound
173data is converted to 16-bit integers for processing.
174