The MidiSyn Technical Page
Although this program is being touted as a MIDI
to WAVE file converter, this is really a full-fledged synthesizer, albeit a
non-real-time one. Of course, being non-real-time, there is no point in trying
to respond to real-time MIDI commands, so this program is just a synthesis
engine. More specifically, this is a Wavetable type of synthesizer that uses
EMU/Creative Sound Fonts, as the source of samples and control information to
be used in the synthesis process.
There are
both advantages and drawbacks in the fact that the program doesn't work in real
time. The main advantage in being non-real-time is certainly the ability to
trade time for sound quality. Good sound requires sometimes a huge amount of
events to process. Real time synthesis engines have limited processing
resources to allocate to events so, when they are overwhelmed, the only
solution is to ignore some of them, hopefully the least important ones, so that
the sound quality will not suffer too much.
Take for
instance the case of polyphony. All real-time synthesis engines have specific
limits for the number of notes that they can play simultaneously. A
non-real-time synthesis engine can afford unlimited polyphony, as this is not a
problem, the synthesis process just takes more time.
There is
also the problem of latency: All the notes of a chord should start at exactly
the same time. As the 'note on' events come sequentially and each note takes
some time to process, the last notes in a complex chord will have some delay
relative to the first ones, sometimes to the point where it is noticeable. In a
non-real-time synthesis engine this is not a problem, all the notes in a chord
are synchronised to the microsecond.
In most
every aspect, this program follows the official Sound Font specification,
version 2.0b from May 2, 1997 and available at this location.
This section describes the main architectural blocks
that make up the sound generation engine.
At the
heart of the sound generation engine is the Sound Generator. As the name implies, this is the functional block
where almost all the sound generation takes place. All the other blocks, either
generate parameters that control the Sound Generators or perform some
subsidiary sound processing activity.
A sound
generator includes a digital oscillator that feeds a digital low pass resonant
filter that feeds a digital amplifier.
There are
two ADSR generators called Modulation
Envelope and Volume Envelope.
The Modulation Envelope is connected
to the oscillator and to the digital filter. The Volume Envelope affects only the digital amplifier.
There are
two Low Frequency Oscillators called Vibrato
LFO and Modulation LFO. The Vibrato LFO affects only the digital
oscillator, the Modulation LFO is
connected to all the three main components: digital oscillator, digital low
pass filter and digital amplifier.
Let’s take
a look at each one of these components:
¨
Digital oscillator. This is the component responsible
for the play-back of the samples contained in the Sound Font at the output
sampling-rate and at the desired output frequency. It uses linear interpolation
whenever there is a need to generate more samples than the ones included in the
Sound Font.
¨ Low pass filter. At zero resonance, this filter is characterised by having a flat
passband to the cut-off frequency, then a rolloff at 12dB per octave above that
frequency. The resonance, when non-zero, comprises a peak at the cut-off
frequency, superimposed on the above response.
¨
Digital amplifier. This is just a multiplier that
applies to the sound samples output by the digital filter the modifiers
generated by the Modulation LFO and by the Volume Envelope.
¨
Low frequency oscillators. These are low frequency (typically
a few Hz) digital oscillators that generate a triangular waveform. The LFO
oscillators modulate the sound in the main components of the Sound Generator.
¨
Envelope generators. An envelope generates a control signal
in six phases: Initial delay, attack, hold, decay, sustain and release. In each
phase transition the control signal ramps up or down in a specific way (linear
or concave/convex). According to the Sound Font specification, the attack ramp
should be convex, all the others linear. However, the units generated by this
generator are logarithmic in nature (dB).
Of the sound processing building blocks that
the rendering engine is composed of, the ones that remain to be described are:
¨
Output Mixer. This just sums up the samples
generated by all the Sound Generators active at the moment and collects them in
three buffers. The dry buffer, the chorus buffer and the reverberation buffer.
Obviously, the samples in the last two buffers will be processed by the
appropriate effects unit.
¨
Reverberation Unit. The samples collected by the
output mixer in the reverberation buffer are be processed by this unit.
¨
Chorus Unit. The samples collected by the
output mixer in the chorus buffer are processed by this unit. This is the most
basic type of chorus unit, composed of a single comb filter controlled by a low
frequency oscillator with triangular wave shape.
¨
Delay Unit. This is a very versatile unit,
which can be used to implement not only the classical Delay unit, but also
Echo, Phaser, Flanger and even Chorus.
¨
Limiter/Compressor. This is the last block that
processes the sound before it reaches the output file. It contains a dynamic
compressor and a static one. As all the internal sample processing is done in 32-bit
variables, the purpose of the Dynamic compressor is to limit, in an orderly
way, all the samples to 16-bit values. In order to limit the output in a smooth
way, this block looks ahead in the buffer and starts limiting the output some
time before a peak actually occurs. The Static Compressor included in this unit
works on the instantaneous value of the output. It attenuates all the output
values that are over a certain threshold by an amount that is dependent on the
value itself: The greater the value the greater the attenuation. This causes a
“soft” clipping of the sound peaks so that the overall volume can be increased.
This program tries to emulate as closely as
possible the inner workings of the commercial synthesis engines (usually
hardware based) that are based on Sound Font technology as described in the
official Sound Font specification. Of course there are many aspects of the
synthesis process that are engine specific and therefore are left out of the
specification.
MidiSyn
does not process NRPN parameters; therefore, CC98 and CC99 are ignored. The
only RPN processed is the Pitch-wheel Sensitivity (RPN 0). By default the pitch
wheel excursion is set to 2 semitones (200 cents).
For a list
of modulators and controllers processed by MidiSyn and their effect in the
sound generation process take a look at The MidiSyn Help Page.
Comments, suggestions and bug reports are welcome and should be sent to fadevelop@clix.pt
This page last modified 2002-11-20 - Copyright
© 2000-2002 ACE