SoundFont
Tutorial
Introduced
in 1993, the SoundFont sample-based synthesis format has become a standard with
the proliferation of the Creative Technology SoundBlaster AWE32 sound card (and
its sequels Live!, and Audigy) which use the EMU8000 synthesizer engine.
SoundFonts,
in a manner analogous to character fonts, enable the portable rendering of a
musical composition with the actual timbres intended by the performer or
composer. The SoundFont format is a portable, extensible, general interchange
standard for sample-based synthesizer sounds and associated articulation data.
A
SoundFont bank is a collection of sounds in the SoundFont format. Such bank
contains both the digital audio samples which have been captured from a sound
source and the instructions to the synthesizer on how to articulate this sound
based on the musical or sonic context as expressed by MIDI.
For
example, a trumpet could be a particular sound in a SoundFont bank which might
contain both recordings of trumpets being played at several different pitches,
as well as information which could tell the synthesizer to filter or mute the
sounds when notes were played softly, loop information about the sample which
would allow a short recording to be stretched into a sustained note and
instructions on how to apply vibrato or to bend the pitch of the note based on
MIDI commands.
The
trumpet sound example above is just like a letter “a” in a type font. The
different sounds produced by different keys and velocities of the trumpet in
the SoundFont bank are analogous to different displays produced by different
sizes of the letter “a” in the type font. Different monitors displaying the
letter “a” in different sizes based on their resolution, memory and other
hardware capabilities is just like different synthesizers playing the trumpet
based on their synthesis capabilities.
SoundFonts
come in two flavours: Standard and Compressed. This page describes both types
of SoundFonts and their use.
The
Musical Instrument Digital Interface (MIDI) language has become a standard in
the PC industry for the representation of musical scores.
However,
as you probably know, MIDI files do not carry any sound, they are a collection
of commands for sound producing equipment (synthesizers). The commands are
like: “play note C5 using a guitar sound”.
When
it receives the command, the synthesizer must come up with the actual guitar
sound. Sound Font files. Provide that kind of information. SoundFonts carry not
only the actual instrument sounds, the so-called samples, but also the
so-called articulation data, that is, instructions on how to play the sample
data.
When
the synthesizer receives a NOTE ON command like the one stated previously, it
looks in the SoundFont for the sample corresponding to the desired sound and
plays it.
Among
the several synthesis methods in use today this is the one that provides more
realism as what you hear is actually the sound recorded from a real instrument.
The drawback is that it takes a huge amount of data to this.
Imagine
that you are to build a SoundFont for a piano. If we assume that each note will
sound at most for 10 seconds, you will need, for a 16-bit mono sample at the
standard rate of 44100 samples per second (the CD standard), 882000 bytes of
data for each note. Considering an 88 key piano, you would need 77,616,000
bytes for the entire set. Now consider that when you strike a piano key with
different strength (called velocity in MIDI parlance), the sound does not only
vary in strength but has a different composition. So for more realism, you
should multiply the previous number by a set of different velocities. This is
what you need for a single instrument; now imagine what you need for the 128
instruments that make up a General Midi set.
One
way to solve this problem is to resort to looping: Instead of using a sample
for the entire duration of a note, use a smaller sample and repeat it as needed
to fulfil the note length required. Unfortunately, by doing this, the sound
quality is affected, and some artefacts are introduced, defeating to some
extent the objective of sound realism.
On
the other hand, the sound of an instrument does not have a steady amplitude for
the duration of a note; indeed it follows a pattern, called envelope, which
changes almost continuously. So, if you use loops, you must supply the envelope
to modulate the sound. Other elements of the sound like tremolo and vibrato
must also be simulated. This will push the quality of the sound further away from
the original.
Another
way to reduce storage requirements for sample data is to use samples for some
of the notes, not for all of them, for instance, use samples for every other
note. The sound corresponding to the missing samples is generated by interpolation
between the sounds of the two closest notes. This again reduces quality and
introduces undesirable artefacts.
Conclusion:
Good sound quality and realism can be obtained using standard SoundFonts, but
it is expensive.
As
stated above, sample size can greatly affect the quality of the sound generated
by a sample-based synthesis engine. However, standard SoundFonts impose limits
on the size of a file and the size of each sample in a file. On the other hand,
huge SoundFont files are just too awkward to handle: For instance if a file
does not fit in a single CD, it becomes difficult to carry from one system to
another.
One
way out of this is to compress the sample data. However, to maintain sound
quality, the compression and decompression processes must be lossless, that is,
after decompression, the sample data must be bit-by-bit identical to the
original data.
Compressed
SoundFonts have mostly the same information as Standard ones, but compressed in
a very efficient way. This allows the creation of very big SoundFonts, which
can lead to big improvements in sound quality.
Here are some links to sites that have either
SoundFonts or information about them:
Comments, suggestions and bug reports are welcome and should be sent to fadevelop@clix.pt
This page last modified 2002-11-20 - Copyright
© 2000-2002 ACE