• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1This document was created by editing together a series of emails and IRC logs.
2This means that the language might seem a little weird at places, but it should
3outline most of the thinking and design adding MIDI support to GStreamer so far.
4
5Authors of this document include:
6
7Steve Baker 		<steve@stevebaker.org>
8Leif Johnson 		<leif@ambient.2y.net>
9Andy Wingo 		<wingo@pobox.com>
10Christian Schaller 	<Uraeus@gnome.org>
11
12About MIDI
13----------
14
15MIDI (Musical Instrument Digital Interface) is used mainly as a communications
16protocol for devices in a music studio. These devices could be physical entities
17(e.g. synthesizers, sequencers, etc.) or purely logical (e.g. sequencers or
18filter banks implemented as software applications).
19
20The MIDI specification essentially consists of a list of MIDI messages that can
21be passed among devices ; these messages (also referred to as "events") are
22usually things like NoteOn (start playing a sound), NoteOff (stop playing a
23sound), Clock (for synchronization), ProgramChange (for signaling an instrument
24or program change), etc.
25
26MIDI is different from other cross-device or inter-process streaming methods
27(e.g. JACK and possibly some CORBA implementations) because MIDI messages are
28discrete and usually only exchanged a few times per second ; the devices
29involved are supposed to interpret the MIDI messages and produce sounds or
30signals. The devices in a MIDI chain typically send their audio signals out on
31separate (physical) cables or (logical) channels that have nothing to do with
32the MIDI chain itself.
33
34We want to have MIDI messages available in GStreamer pipelines because MIDI is a
35common protocol in many existing studios, and MIDI is more or less a standard
36for inter-device communications. With MIDI support in GStreamer we can look
37forward to (a) controlling and being controlled by external devices like
38keyboards and sequencers, and (b) synchronizing and communicating among multiple
39applications on a studio computer.
40
41GStreamer and MIDI
42------------------
43
44MIDI could be thought of in terms of dataflow as a sparse, non-constant flow of
45bytes. GStreamer works best with near-constant data flow, so a MIDI stream would
46probably have to consist mostly of filler events, sent at a constant tick rate.
47It makes the most sense at this point to distribute MIDI events in a GStreamer
48pipeline as a sequence of subclasses of GstEvent (GstMidiEvent or suchlike).
49
50On-the-wire hardware MIDI connections run at a fixed data rate:
51
52  The MIDI data stream is a unidirectional asynchronous bit stream at 31.25
53  Kbits/sec. with 10 bits transmitted per message (a start bit, 8 data bits, and
54  one stop bit).
55
56Which is to say, 3125 bytes/sec. I would assume that the rawmidi interface would
57already filter out the stop and start bits? dunno. How about the diagram on[1],
58I found that to be useful. The MIDI specification is also available (though I
59can't find it online at the moment ... might have to buy a copy), and there are
60several tutorial and help pages (just google for MIDI tutorial).
61
62There's another form of MIDI (the common usage?), "Standard MIDI files," which
63essentially specify how to save and restore MIDI events in a file. We'll talk
64about that in a bit.
65
66[1] http://www.philrees.co.uk/#midi
67
68MIDI and current Linux/Unix audio systems
69-----------------------------------------
70
71We don't know very much about the OSS MIDI interface; apparently there exists an
72evil /dev/sequencer interface, and maybe a better /dev/midi* one. I only know
73this from overhearing it from people. For latency reasons, the ALSA MIDI
74interface will be much more solid than using these devices ; however, the
75/dev/midi* devices might be more of a cross-platform standard.
76
77ALSA has a couple ways to access MIDI devices. One way is the sequencer API.
78There's a tutorial[1], and some example code[2] -- the paradigm is 'wait on some
79event fd's until you get an event, then process the event'. Not very
80GStreamer-like. This API timestamps the events, much like Standard MIDI files.
81
82The other way to use MIDI with alsa is with the rawmidi interface. There is a
83canonical reference[3] and example code, too[4]. This is much more like
84GStreamer. I do wonder about the ability to connect to other sequencer clients,
85though...
86
87[1] http://www.suse.de/~mana/alsa090_howto.html#sect04
88[2] http://www.suse.de/~mana/seqdemo.c
89[3] http://www.alsa-project.org/alsa-doc/alsa-lib/rawmidi.html#rawmidi
90[4] http://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2rawmidi_8c-example.html#example_test_rawmidi
91
92Getting MIDI into GStreamer
93---------------------------
94
95All buffers are timestamped, and MIDI buffers should be no exception. A buffer
96with MIDI data will have a timestamp which says exactly when the data should be
97played. In some cases this would mean a buffer contains just a couple of bytes
98(eg, NoteOn). If this turns out to be inefficient we can deal with that later.
99
100In addition to integrating more tightly with GStreamer audio pipelines (see the
101dparams and midi2pcm sections below), there are several elements that we will
102need for basic MIDI interaction in GStreamer. These basics include file parsing
103and encoding (is that the opposite of parsing ?), and direct hardware input and
104output. We'll also probably need a more generic sequencer interface for defining
105elements that are capable of sending and receiving this type of nonlinear stream
106information.
107
108For these tasks, we need to define some MIME types, some general properties, and
109some MIDI elements for GStreamer.
110
111Types :
112
113- MIDI being passed to/from a text file : audio/midi (This is in my midi.types
114  file, associated with a .midi or .mid extension. It seems analogous to a .wav
115  file, which contains "audio/x-wav" type information.)
116
117- MIDI in a pipeline : audio/x-gst-midi ?
118
119Properties :
120
121- tick rate : (default to 96.0, or something like that) This is measured in
122  ticks per quarter note (or "pulses per quarter note" (ppqn) to be picky). We
123  should use float for this so we can support nonstandard (fractional)
124  tempos.
125
126- tempo : (default to 120 bpm) This can be measured in bpm (beats per minute,
127  the musician's viewpoint), or mpq (microseconds per quarter note, the unit
128  used in a MIDI file[1]). Seems like we might want a query format for these
129  different units ? Or maybe we should just use MPQ and leave it to apps to do
130  conversions ?
131
132Elements :
133
134- midiparse : audio/midi -> audio/x-gst-midi
135
136  This element converts MIDI information from a file into MIDI signals that can
137  be passed around in GStreamer pipelines. This would parse so-called Standard
138  MIDI files (and XML format MIDI files ?). Standard MIDI files are just
139  timestamped MIDI data; they don't run at a constant bitrate, and for that
140  reason you need this element.
141
142  The timestamps that this element produces would be based on the tempo
143  property, and the time deltas of the MIDI file data. If no data exists for a
144  given tick, the element can just send a filler event.
145
146  The element should support both globbing and streaming the file. Streaming it
147  is the most GStreamerish way of handling it, but there are MIDI file formats
148  which are by definition unstreamable, therefore a MIDI plugin needs to support
149  streaming and globbing - and globbing might be easiest to implement first. The
150  modplug plugin also reads an entire file before playing so its a valid
151  technique.
152
153- ossmidisink : audio/x-gst-midi -> hardware
154
155  Could be added to the existing OSS plugin dir, sends MIDI data to the OSS MIDI
156  sequencer device (/dev/midi). Makes extensive use of GstClock to send out data
157  only when the buffer/event timestamp says it should. (Could instead use the
158  raw MIDI device for clocking, doesn't matter which.)
159
160- alsamidisink : audio/x-gst-midi -> ALSA rawmidi API
161
162  Guess what this does. Don't know whether alsa's sequencer interface would be
163  better than its rawmidi one. Probably rawmidi?
164
165- ossmidisrc, alsamidisrc : hardware -> audio/x-gst-midi
166
167  Real time midi input. This needs to be from the rawmidi APIs.
168
169It seems like we could implement a class hierarchy around these elements. We
170could use a GstMidiElement superclass, which would include the above properties
171and contain utility functions for things like reading from the clock and
172converting between different time measurement units. From this element we ought
173to have GstMidiSource, GstMidiFilter, and GstMidiSink parent classes. (Maybe
174that's overkill ?) Each of these MIDI elements listed above could then inherit
175from an appropriate superclass.
176
177We also need an interface (GstSequencer) to allow multiple implementations for
178one of the most common MIDI tasks (duh, sequencing). The midisinks could
179implement this interface, as could other soft-sequencer elements like
180playondemand. The sequencer interface needs to be able to support MIDI
181sequencing tasks, but it should support more generic sequencing concepts.
182
183As you might have guessed, getting MIDI support into GStreamer is thus a matter
184of (a) creating a series of elements that handle MIDI data, and (b) creating a
185sort of MIDI library (like Timidity ?) that basically includes #defines for MIDI
186message codes and stuff like that. This stuff should be coded in the gst-plugins
187module, under gst-libs/gst/sequencer (for the interface) and
188gst-libs/gst/audio/midi/ (for the defines and superclasses).
189
190Of course, this is just the basics ... read on for the really gory future stuff.
191:)
192
193[1] http://www.borg.com/~jglatt/tech/midifile/ppqn.htm
194
195Looking ahead
196-------------
197
198- MIDI to PCM
199
200It would be nice to be able to transform MIDI (audio/midi) to audio
201(audio/x-raw-{int|float}), which could be further processed in a GStreamer
202pipeline. In effect this would be using GStreamer as some kind of softsynth.
203
204The first way to do this would be to send MIDI data to softsynths and get audio
205data out. There's a very, very nice way of doing this in ALSA (the sequencer
206API). Timidity can already register itself as a sequencer client, as can
207amSynth, AlsaModularSynth, SpiralSynth, etc... and these latter ones are *much*
208more interesting. This is the proper, IMHO, way of doing things.
209
210But, the other question is getting that data back for use by GStreamer. In that
211sense a librafied Timidity would be useful, I guess... see the thing is that all
212of these sequencer clients probably want to output to the sound card directly,
213although they are configurable. In this, the musician's only hope is Jack. If
214the synth is jacked up, we can get its output back into GStreamer. If not, oh
215well, it's gone ...
216
217- MIDI to dparams
218
219Once we have MIDI streams, we can start doing fun things like writing a
220midi2dparams element which would map midi data to control the dynamic parameters
221of other elements, but lets not get ahead of ourselves.
222
223Which gets back to MIDI. MIDI is a representation of control signals. So all you
224need are elements to convert that representation to control signals. In
225addition, you'd probably want something like SuperCollider's Voicer element --
226see [1] for more information on that.
227
228All of this is pretty specific to a synthesizer system, and rightly so :
229multiple projects use it it could go in some kind of library or what-what but
230otherwise it can stay in individial projects.
231
232[1] http://www.audiosynth.com/schtmldocs/Help/Unit_Generators/Spawners/Voicer.help.html
233
234On using dparams for MIDI
235-------------------------
236
237You might want to look into using dparams if:
238
239- you wanted your control parameters to change at a higher rate thanyour buffer
240  rate (think zipper noise and sample-granularity-interpolation)
241- you wanted a better way to store and represent control data than midifiles
242- We wrote a linear interpolation time-aware dparam so that we could really
243  demonstrate what they're good for.
244
245It was always the intention for dparams to be able to send values to and get
246values from pads. All we need is some simple elements to do the forwarding.
247
248Possible inefficiency remedy : GstControlPad
249--------------------------------------------
250
251If it turns out that sending MIDI events spaced out with filler (blank) events
252isn't efficient enough, we'll need to look into implementing something new ; for
253now, though, we'll just try the simple approach and hope our CPUs are fast
254enough. But read on for a little brainstorming.
255
256It seems like GStreamer could benefit from a different subclass of GstPad,
257something like GstControlPad. Pads of this type could contain control data like
258parameters for oscillators/filters, MIDI events, text information for subtitles,
259etc. The defining characteristic of this type of data is that it operates at a
260much lower sample rate than the multimedia data that GStreamer currently
261handles.I think that control data can be sent down existing pads without making
262any changes.
263
264GstControlPad instances could also contain a default value like Wingo has been
265pondering, so apps wouldn't need to connect actual data to the pads if the
266default value sufficed. There could also be some sweet integration with dparams,
267it seems like.If you want a default value on a control pad, just make the source
268element send the value when the state changes. Elements that have control pads
269could also have standard GstPads, and I'd imagine there would need to be some
270scheduler modifications to enable the lower processing demands of control pads.
271
272An example : integrating amSynth[1]
273-----------------------------------
274
275We would want to be able to write amSynth as a plugin. This would require that
276when the process function is called, we have a MIDI buffer as input, containing
277how ever many MIDI events occurred in, say, 1/100 sec for example, and then we
278generate an audio buffer of the same time duration...)
279
280Maybe this will indicate the kind of problems to be faced. GStreamer has solved
281this problem for audio/video syncing, so you should probably do it the same way.
282The first task would be to make this pipeline work:
283
284  filesrc location=foo.mid ! midiparse ! amSynth ! osssink
285
286midiparse will take MIDI file data as an input, and produce timestamped MIDI
287buffers as output. It could have a beats-per-minute property as mentioned above
288to specify how the MIDI beat offsets are converted to timestamps.
289
290An amSynth element should be a loop element. It would read MIDI buffers until it
291has more than enough to produce audio for the duration of one audio buffer. It
292knows it has enough MIDI buffers by looking at the timestamp. Because amSynth is
293setting the timestamps on the audio buffers going out, a MIDI sink element would
294know when to play them. Once this is working, a more challenging pipeline might
295be :
296
297  alsamidisrc ! amSynth ! alsasink
298
299This would be a real-time pipeline : any MIDI input should instantly be
300transformed into audio. You would have small audio buffers for lowlatency (64
301samples seems to be typical).
302
303This is a problem for amSynth because it can't sit there waiting for more MIDI
304just in case there is more than one MIDI event per audio buffer. In this case
305you could either :
306
307- listen to the clock so you know when its time to output the buffer
308- have some kind of real-time mode for amSynth which doesn't wait forMIDI events
309  which may never come
310- have alsamidisrc produce empty timestamped MIDI buffers so that amSynth knows
311  that is time to spit out some audio.
312
313[1] http://amsynthe.sourceforge.net/amSynth/index.html
314
315Extended midi files : .kar and karaoke
316-----------------------------------
317
318KAR files are standard MIDI files that also contain a stream with lyrics, for karaoke,
319synchronised on music. MIDI players play them without any problem, ignoring the
320additional data.
321
322It is the more widespread karaoke file format. (one other being .kok files, for mp3)
323
324KAR files are based on standard MIDI files with the following additional events:
325
326The KAR text meta events start with an @ followed by a character indicating
327the type of KAR text meta event, then followed by text for that event.  The
328following text meta events occur embedded in regular MIDI text events:
329
330FileType:     @KMIDI KARAOKE FILE
331Version:      @V0100
332Information:  @I<text>
333Language:     @LENGL
334Title 1:      @T<title>
335Title 2:      @T<author>
336Title 3:      @T<copyright>
337
338The following lyric text indicators are defined.  A \ (backslash) in the
339text is to clear the screen. A / (forwardslash) in the text is a line feed
340(next line).
341
342Some more info on the data format could be found at those locations :
343
344http://www.krazykats-karaoke.co.uk/file_formats.html
345http://www.wotsit.org/download.asp?f=kar
346http://filext.com/detaillist.php?extdetail=KAR
347
348Some Linux players that handle this format :
349
350http://lmuse.sourceforge.net/files.php
351http://sourceforge.net/projects/gkaraoke/
352