Back to the MidiSyn Home page


Although this program is being touted as a MIDI to WAVE file converter, this is really a full-fledged synthesizer, albeit a non-real-time one. Of course, being non-real-time, there is no point in trying to respond to real-time MIDI commands, so this program is just a synthesis engine. More specifically, this is a Wavetable type of synthesizer that uses EMU/Creative Sound Fonts, as the source of samples and control information to be used in the synthesis process.

There are both advantages and drawbacks in the fact that the program doesn't work in real time. The main advantage in being non-real-time is certainly the ability to trade time for sound quality. Good sound requires sometimes a huge amount of events to process. Real time synthesis engines have limited processing resources to allocate to events so, when they are overwhelmed, the only solution is to ignore some of them, hopefully the least important ones, so that the sound quality will not suffer too much.

Take for instance the case of polyphony. All real-time synthesis engines have specific limits for the number of notes that they can play simultaneously. A non-real-time synthesis engine can afford unlimited polyphony, as this is not a problem, the synthesis process just takes more time.

There is also the problem of latency: All the notes of a chord should start at exactly the same time. As the 'note on' events come sequentially and each note takes some time to process, the last notes in a complex chord will have some delay relative to the first ones, sometimes to the point where it is noticeable. In a non-real-time synthesis engine this is not a problem, all the notes in a chord are synchronised to the microsecond.

In most every aspect, this program follows the official Sound Font specification, version 2.0b from May 2, 1997 and available at this location.


The Sound Generation Engine

This section describes the main architectural blocks that make up the sound generation engine.

At the heart of the sound generation engine is the Sound Generator. As the name implies, this is the functional block where almost all the sound generation takes place. All the other blocks, either generate parameters that control the Sound Generators or perform some subsidiary sound processing activity.

A sound generator includes a digital oscillator that feeds a digital low pass resonant filter that feeds a digital amplifier.

There are two ADSR generators called Modulation Envelope and Volume Envelope. The Modulation Envelope is connected to the oscillator and to the digital filter. The Volume Envelope affects only the digital amplifier.

There are two Low Frequency Oscillators called Vibrato LFO and Modulation LFO. The Vibrato LFO affects only the digital oscillator, the Modulation LFO is connected to all the three main components: digital oscillator, digital low pass filter and digital amplifier.

Lets take a look at each one of these components:

       Digital oscillator. This is the component responsible for the play-back of the samples contained in the Sound Font at the output sampling-rate and at the desired output frequency. It uses linear interpolation whenever there is a need to generate more samples than the ones included in the Sound Font.

       Low pass filter. At zero resonance, this filter is characterised by having a flat passband to the cut-off frequency, then a rolloff at 12dB per octave above that frequency. The resonance, when non-zero, comprises a peak at the cut-off frequency, superimposed on the above response.

       Digital amplifier. This is just a multiplier that applies to the sound samples output by the digital filter the modifiers generated by the Modulation LFO and by the Volume Envelope.

       Low frequency oscillators. These are low frequency (typically a few Hz) digital oscillators that generate a triangular waveform. The LFO oscillators modulate the sound in the main components of the Sound Generator.

       Envelope generators. An envelope generates a control signal in six phases: Initial delay, attack, hold, decay, sustain and release. In each phase transition the control signal ramps up or down in a specific way (linear or concave/convex). According to the Sound Font specification, the attack ramp should be convex, all the others linear. However, the units generated by this generator are logarithmic in nature (dB).



Of the sound processing building blocks that the rendering engine is composed of, the ones that remain to be described are:

       Output Mixer. This just sums up the samples generated by all the Sound Generators active at the moment and collects them in three buffers. The dry buffer, the chorus buffer and the reverberation buffer. Obviously, the samples in the last two buffers will be processed by the appropriate effects unit.

       Reverberation Unit. The samples collected by the output mixer in the reverberation buffer are be processed by this unit.

       Chorus Unit. The samples collected by the output mixer in the chorus buffer are processed by this unit. This is the most basic type of chorus unit, composed of a single comb filter controlled by a low frequency oscillator with triangular wave shape.

       Delay Unit. This is a very versatile unit, which can be used to implement not only the classical Delay unit, but also Echo, Phaser, Flanger and even Chorus.

       Limiter/Compressor. This is the last block that processes the sound before it reaches the output file. It contains a dynamic compressor and a static one. As all the internal sample processing is done in 32-bit variables, the purpose of the Dynamic compressor is to limit, in an orderly way, all the samples to 16-bit values. In order to limit the output in a smooth way, this block looks ahead in the buffer and starts limiting the output some time before a peak actually occurs. The Static Compressor included in this unit works on the instantaneous value of the output. It attenuates all the output values that are over a certain threshold by an amount that is dependent on the value itself: The greater the value the greater the attenuation. This causes a soft clipping of the sound peaks so that the overall volume can be increased.


Modulators & Controllers

This program tries to emulate as closely as possible the inner workings of the commercial synthesis engines (usually hardware based) that are based on Sound Font technology as described in the official Sound Font specification. Of course there are many aspects of the synthesis process that are engine specific and therefore are left out of the specification.

MidiSyn does not process NRPN parameters; therefore, CC98 and CC99 are ignored. The only RPN processed is the Pitch-wheel Sensitivity (RPN 0). By default the pitch wheel excursion is set to 2 semitones (200 cents).

For a list of modulators and controllers processed by MidiSyn and their effect in the sound generation process take a look at The MidiSyn Help Page.



Comments, suggestions and bug reports are welcome and should be sent to



This page last modified 2002-11-20 - Copyright 2000-2002 ACE

Back to the MidiSyn Home page