1ELEMENTS (v4lsrc, alsasrc, osssrc) 2-------- 3- capturing elements should not do fps/sample rate correction themselves 4 they should timestamp buffers according to "a clock", period. 5 6- if the element is the clock provider: 7 - timestamp buffers based on the internals of the clock it's providing, 8 without calling the exposed clock functions 9 - do this by getting a measure of elapsed time based on the internal clock 10 that is being wrapped. Ie., count the number of samples the *device* 11 has processed/dropped/... 12 If there are no underruns, the produced buffers are a contiguous data 13 stream. 14 - possibilities: 15 - the device has a method to query for the absolute time related to 16 a buffer you're about to capture or just have captured: 17 Use that time as the timestamp on the capture buffer 18 (it's important that this time is related to the capture buffer; 19 ie. it's a time that "stands still" if you're not capturing) 20 - since you're providing the clocking, but don't have the previous method, 21 you should open the device with a given rate and continuously read 22 samples from it, even in PAUSED. This allows you to update an internal 23 clock. 24 You use this internal clock as well to timestamp the buffers going out, 25 so you again form a contiguous set of buffers. 26 The only acceptable way to continuously read samples then is in a private 27 thread. 28 - as long as no underruns happen, the flow being output is a perfect stream: 29 the flow is data-contiguous and time-contiguous. 30 - underruns should be handled like this: 31 - if the code can detect how many samples it dropped, it should just 32 send the next buffer with the new correct offset. Ie, it produced 33 a data gap, and since it provides the clock, it produces a perfect 34 data gap (the timestamp will be correctly updated too). 35 - if it cannot detect how many samples it dropped, there's a fallback 36 algorithm. The element uses another GstClock (for example, system clock) 37 on which it corrects the skew and drift continuously as long as it 38 doesn't drop. When it detected a drop, it can get the time delta 39 on the other GstClock since the last time it captured and the current 40 time, and use that delta to guesstimate the number of samples dropped. 41 42- if the element is not the clock provider 43 - the element should always respect the clock it is given. 44 - the element should timestamp outgoing buffers based on time given by 45 the provided clock, by querying for the time on that clock, and 46 comparing to the base time. 47 - the element should NOT drop/add frames. Rather, it should just 48 - timestamp the buffers with the current time according to the provided 49 clock 50 - set the duration according to the *theoretical/nominal* framerate 51 - when underruns happen (the device has lost capture data because our 52 element is not handling them quickly enough), this should be detectable 53 by the element through the device. On underrun, the offset of your 54 next buffer will not match the end_offset of your previous one 55 (ie, the data flow is no longer contiguous). 56 If the exact number of samples dropped is detectable, this is the 57 difference between new offset and old offset_end. 58 If it's not detectable, it should be guessed based on the elapsed time 59 between now and the last capture. 60 61- a second element can be responsible for making the stream time-contiguous. 62 (ie, T1 + D1 = T2 for all buffers). This way they are made 63 acceptable for gapless presentation (which is useful for audio). 64 - The element treats the incoming stream as data-contiguous but not 65 necessarily time-contiguous. 66 - If the timestamps are contiguous as well, then everything is fine and 67 nothing needs to be done. This is the case where a file is being read 68 from disk, or capturing was done by an element that provided the clock. 69 - If they are not contiguous, then this element must make them so. 70 Since it should respect the nominal framerate, it has to stretch or 71 shorten the incoming data to match the timestamps set on the data. 72 For audio and video, this means it could interpolate or add/drop samples. 73 For audio, resampling/interpolation is preferred. 74 For video, a simple mechanism that chooses the frame with a timestamp as 75 close as possible to the theoretical timestamp could be used. 76 - When it receives a new buffer that is not data-contiguous with the 77 previous one, the capture element dropped samples/frames. 78 The adjuster can correct this by sending out as much "no-signal" data 79 (for audio, e.g. silence or background noise; for video, sending out 80 black frames) as it wants, since a data discontinuity is unrepairable. 81 So it can use these to catch up more aggressively. 82 It should just make sure that the next buffer it gets again goes 83 back to respecting the nominal framerate. 84 85- To achieve the best possible long-time capture, the following can be done: 86 - audiosrc captures audio and provides the clock. It does contiguous 87 timestamping by default. 88 - videosrc captures video timestamped with the audiosrc's clock. This data 89 feed doesn't match the nominal framerate. If there is an encoding format 90 that supports storing the actual timestamps instead of pretending the 91 data flow respects the nominal framerate, this can be corrected after 92 recording. 93 - at the end of recording, the absolute length in time of both streams, 94 measured against a common clock, is the same or can be made the same by 95 chopping off data. 96 - the nominal rate of both audio and video is also known. 97 - given the length and the nominal rate, we have an evenly spaced list 98 of theoretical sampling points. 99 - video frames can now be matched to these theoretical sampling points by 100 interpolating or reusing/dropping frames. It can choose the best 101 possible algorithm for this to decrease the visible effects 102 (interpolating results in blur, add/drop frames results in jerkiness). 103 - with the video resampled at the theoretical framerate, and the audio 104 already correct, the recording can now be muxed correctly into a format 105 that implicitly assumes a data rate matching the nominal framerate. 106 - One possibility is to use the GDP to store the recording, because that 107 retains all of the timestamping information. 108 - The process is symmetrical; if you want to use the clock provided by 109 the video capturer, you can stretch/shrink the audio at the end of 110 recording to match. 111 112TERMINOLOGY 113----------- 114- nominal rate 115 the framerate/samplerate 116 exposed in the caps; ie. the theoretical framerate of the 117 data flow. This is the fps reported by the device or set for the encoder, 118 or the sampling rate of the audio device. 119- contiguous data flow 120 offset_end of old buffer matches offset of new buffer 121 for audio, this is a more important requirement, since you configure 122 output devices for a contiguous data flow. 123- contiguous time flow 124 T1 + D1 = T2 125 for video, this is a more important requirement, because the sampling 126 period is bigger, so it is more important to match the presentation time 127- "perfect stream" 128 data and time are contiguous and match the nominal rate 129 videotestsrc, sinesrc, filesrc ! decoder produce this 130 131NETWORK 132------- 133- elements can be synchronized by writing a NTP clock subclass that listens 134 to an ntp server, and tries to match its own clock against the NTP server 135 by doing gradual rate adjustment, compared with the own system clock. 136- sending audio and video over the network using tcpserversink is possible 137 when the streams are made to be perfect streams and synchronized. 138 Since the streams are perfect and synchronized, the timestamps transmitted 139 along with the buffers can be trusted. The client just has to make 140 sure that it respects the timestamps. 141- One good way of doing that is to make an element that provides a clock 142 based on the timestamps of the data stream, interpolating using another 143 GstClock inbetween those time points. This allows you to create 144 a perfect network stream player (one that doesn't lag (increasing buffers)) 145 or play too fast (having an empty network queue). 146- On the client side, a GStreamer-ish way to do that is to cut the playback 147 pipeline in half, and have a decoupled element that converts 148 timestamps/durations (by resampling/interpolating/...) so that the sinks 149 consume data at the same rate the tcp sources provide it. 150 tcpclientsrc ! theoradec ! clocker name=clocker { clocker. ! xvimagesink } 151 152SYNCHRONISATION 153--------------- 154- low rate source with high rate source: 155 the high rate source can drop samples so it starts with the same phase 156 as the low rate source. This could be done in a synchronizer element. 157 example: 158 - audio, 8000 Hz, and video, 5 fps 159 - pipeline goes to playing 160 - video src does capture and receives its first frame 50 ms after playing 161 -> phase is -90 or 270 degrees 162 - to compensate, the equivalent of 150 ms of audio could be dropped so 163 that the first videoframe's timestamp coincides with the timestamp of 164 the first audio buffer 165 - this should be done in the raw audio domain since it's typically not 166 possible to chop off samples in the encoded domain 167 168- two low rate sources: 169 not possible to do this correctly, maybe something in the middle can be 170 found ? 171 172IMPROVING QUALITY 173----------------- 174- video src can capture at a higher framerate than will be encoded 175- this gives the corrector more frames to choose from or interpolate with 176 to match the target framerate, reducing jerkiness. 177 e.g. capturing at 15 fps for 5 fps framerate. 178 179LIVE CHANGES IN PIPELINE 180------------------------ 181- case 1: video recording for some time, user wants to add audio recording on 182 the fly 183 - user sets complete pipeline to paused 184 - user adds element for audio recording 185 - new element gets same base time as video element 186 - on PLAYING, new element will be in sync and the first buffer produced 187 will have a non-zero timestamp that is the same as the first new video 188 buffer 189 190- case 2: video recording for some time, user wants to add in an audio file 191 from disk. 192 - two possible expectations: 193 A) user expects the audio file to "start playing now" and be muxed 194 together with the current video frames 195 B) user expects the audio file to "start playing from the point where the 196 video currently is" (ie, video is at 10 seconds, so mux with audio 197 starting from 10 secs) 198 - case A): 199 - complete pipeline gets paused 200 - filesrc ! dec added 201 - both get base_time same as video element 202 - pipeline to playing 203 - all elements receive new "now" as base_time so timestamps are reset 204 - muxer will receive synchronized data from both 205 - case B): 206 nothing gets paused 207 - filesrc ! dec added 208 - both get base_time that is the current clock time 209 - pipeline to playing 210 - core sets 211 1) - new audio part starts sending out data with timestamp 0 from start 212 of file 213 - muxer receives a whole set of frames from the audio side that are late 214 (since the timestamps start at 0), so keeps dropping until it has 215 caught up with the current set). 216 OR 217 2) - audio part does clock query 218 219THINGS TO DIG UP 220---------------- 221- is there a better way to get at "when was this frame captured" then doing 222 a clock query after capturing ? 223 Imagine a video device with a hardware buffer of four frames. If you 224 haven't asked for a frame from it in a while, three frames could be 225 queued up. So three consecutive frame gets result in immediate returns 226 with pretty much the same clock query for each of them. 227 So we should find a way to get "a comparable clock time" corresponding 228 to the captured frame. 229 230- v4l2 api returns a gettimeofday() timestamp with each buffer. 231 Given that, you can timestamp the buffer by subtracting the delta 232 between the buffer's clock timestamp with the current system clock time, 233 from the current time reported by the provided clock. 234