1Fixing Threading 2---------------- 3 41) Observations 5 6 The following observations are made when considering the current (17/11/2004) 7 problems in gstreamer. 8 9 - Bin state changes. 10 Currently the state of a bin is determined by the highest state of the 11 children, This is in particular a problem for GstThread because a thread 12 should start/stop spinning at any time depending on the state of a child. 13 14 ex 1: 15 16 +-------------------------------------+ 17 | GstThread | 18 | +--------+ +---------+ +------+ | 19 | | src | | decoder | | sink | | 20 | | src-sink src-sink | | 21 | +--------+ +---------+ +------+ | 22 +-------------------------------------+ 23 24 When performing the state change on the GstThread to PLAYING, one of the 25 children (at random) will go to PLAYING first, this will trigger a method 26 in GstThread that will start spinning the thread. Some elements are not yet 27 in the PLAYING state when the scheduler starts iterating elements. This 28 is not a clean way to start the data passing. 29 30 State changes also trigger negotiation and scheduling (in the other thread) 31 can do too. This creates races in negotiation. 32 33 - ERROR and EOS conditions triggering a state change 34 35 A typical problem is also that since scheduling starts while the state change 36 happens, it is possible that the elements go to EOS or ERROR before the 37 state change completes. Currently this makes the elements go to PAUSED again, 38 creating races with the state change in progress. This also gives the 39 impression to the core that the state change failed. 40 41 - no locking whatsoever 42 43 When an element does a state change, it is possible for another thread to 44 perform a conflicting state change. 45 46 - negotiation is not designed to work over multithread boundaries. 47 48 negotiation over a queue is not possible. There is no method or policy of 49 discovering a media type and then committing it. It is also not possible to 50 tie the negotiated media to the relevant buffer. 51 52 ex1: 53 it Should be possible to queue the old and the new formats in a queue. 54 The element connected to the sinkpad of the queue should be able to 55 find out that the new format will be accepted by the element connected 56 on the srcpad of the queue, even if that element is streaming the old 57 format. 58 59 +------------------------------+ 60 | GstQueue | 61 | +++++++++++++++++++++++++ | 62 -sink |B|B|B|B|B|B|A|A|A|A|A|A| src- 63 | +++++++++++++++++++++++++ | 64 +------------------------------+ 65 +----------+ +----------+ 66 buffers in buffers in 67 new format old format 68 69 - element properties are not threadsafe 70 71 When setting an element property while streaming, the element does no 72 locking whatsoever to guarantee its internal consistency. 73 74 - No control over streaming. 75 76 When some GstThread is iterating and you want to reconnect a pad, there 77 is no way to block the pad, perform the actions and then unblock it 78 again. This leads to thread problems where a pad is negotiation at the 79 same time that it is passing data. 80 81 This is currently solved by PAUSING the pipeline or performing the actions 82 in the same threadcontext as the iterate loop. 83 84 - race conditions in synchronizing the clocks and spinning up the pipeline. 85 Currently the clock is started as soon as the pipeline is set to playing. 86 Because some time elaspes before the elements are negotiated, autoplugged 87 and streaming, the first frame/sample almost always arrives late at the 88 sinks. Hacks exist to adjust the element base time to compensate for the 89 delay but this clearly is not clean. 90 91 - race conditions when performing seeks in the pipeline. Since the elements 92 have no control over the streaming threads, they cannot block them or 93 resync them to the new seek position. It is also hard to synchronize them 94 with the clock. 95 96 - race conditions when sending tags and error messages up the pipeline 97 hierarchy. These races are either caused by glib refcounting problems and 98 by not properly locking. 99 100 - more as changes are implemented and testcases are written 101 1022) possible solutions 103 104 - not allowing threading at all 105 106 Run the complete pipeline in a single thread. Complications to solve include 107 handling of blocking operations like source elements blocking in kernel 108 space, sink elements blocking on the clock or kernel space, etc.. In practice, 109 all operations should be made non-blocking because a blocking element can 110 cause the rest of the pipeline to block as well and cause it to miss a deadline. 111 A non-blocking model needs cooperation from the kernel (with callbacks) or 112 requires the use of a polling mechanism, both of which are either impractical 113 or too CPU intensive and in general not achievable for a general purpose 114 Multimedia framework. For this reason we will not go further with this 115 solution. 116 117 - Allow threading. 118 119 To make this work, We propose the following changes: 120 121 - Remove GstThread, it does not add anything useful in a sense that you cannot 122 arbitrarily place the thread element, it needs decoupled elements around the 123 borders. 124 125 - Simplify the state changes of bins elements. A bin or element never changes 126 state automatically on EOS and ERROR. 127 128 - Introduce the concept of the application and the streaming thread. All data 129 passing is done in the streaming thread. This also means that all operations 130 either are performed in the application thread or streaming thread and that 131 they should be protected against competing operations in other threads. 132 This would define a policy for adding appropriate locking. 133 134 - Move the creation of threads into source and loop-based elements. This will 135 make it possible for the elements in control of the threads to perform the 136 locking when needed. One particular instance is for example the state changes, 137 by creating the threads in the element, it is possible to sync the streaming 138 and the application thread (which does the state change). 139 140 - Remove negotiation from state changes. This will remove the conflict between 141 streaming and negotiating elements. 142 143 - add locks around pad operations like negotiation, streaming, linking, etc. This 144 will remove races between these conflicting operations. This will also make it 145 possible to un/block dataflow. 146 147 - add locks around bin operations like add/removing elements. 148 149 - add locks around element operations like state changes and property changes. 150 151 - add a 2-phase directed negotiation process. The source pad queries and figures 152 out what the sinkpad can take in the first phase. In the second phase it sends 153 the new format change as an event to the peer element. This event can be 154 interleaved with the buffers and can travel over queues inbetween the buffers. 155 Need to rethink this wrt bufferpools (see DShow and old bufferpool implementation) 156 157 - add a preroll phase that will be used to spin up the pipeline and align frames/samples 158 in the sinks. This phase will happen in the PAUSED state. This also means that 159 dataflow will happen in the PAUSED state. Sinks will not sink samples in the PAUSED 160 state but will complete their state change asynchronously. This will allow 161 us to have perfect synchronisation with the clock. 162 163 - a two phase seek policy. First the event travels upstream, putting all elements in 164 the seeking phase and making them synchronize to the new position. In the 165 second phase the DISCONT event signals the end of the seek and all filters can 166 continue with the new position. 167 168 - Error messages, EOS, tags and other events in the pipeline should be sent to a 169 mainloop. The app then has an in-thread mechanism for getting information about 170 the pipeline. It should also be possible to get the messages directly from the 171 elements itself, like signals. The application programmer has to know that 172 working these events come from another thread and should handle them accordingly. 173 174 - Add return values to push/pull so that errors upstream or downstream can be noted 175 by other elements so that they can disable themselves or propagate the error. 176 177 1783) detailed explanation 179 180 a) Pipeline construction 181 182 Pipeline construction includes: 183 184 - adding/removing elements to the bin 185 - finding elements in a bin by name and interface 186 - setting the clock on a pipeline. 187 - setting properties on objects 188 - linking/unlinking pads 189 190 These operations should take the object lock to make sure it can be 191 executed from different threads. 192 193 When connecting pads to other pads from elements inside another bin, 194 we require that the bin has a ghostpad for the pad. This is needed so 195 that the bin looks like a self-contained element. 196 197 not allowed: 198 +---------------------+ 199 | GstBin | 200 +---------+ | +--------+ | 201 | element | | | src | | 202 sink src------sink src- ... | 203 +---------+ | +--------+ | 204 +---------------------+ 205 206 allowed: 207 +-----------------------+ 208 | GstBin | 209 | +--------+ | 210 +---------+ | | src | | 211 | element | | sink src- ... | 212 sink src---sink/ +--------+ | 213 +---------+ +-----------------------+ 214 215 This requirement is important when we need to sort the elements in the 216 bin to perform the state change. 217 218 219 testcases: 220 221 - create a bin, add/remove elements from it 222 - add/remove from different threads and check the bin integrity. 223 224 b) state changes 225 226 An element can be in one of the four states NULL, READY, PAUSED, PLAYING. 227 228 NULL: starting state of the element 229 READY: element is ready to start running. 230 PAUSED: element is streaming data, has opened devices etc. 231 PLAYING: element is streaming data and clock is running 232 233 Note that data starts streaming even in the PAUSED state. The only difference 234 between the PAUSED and PLAYING state is that the clock is running in the 235 PLAYING state. This mostly has an effect on the renderers which will block on 236 the first sample they receive when in PAUSED mode. The transition from 237 READY->PAUSED is called the preroll state. During that transition, media is 238 queued in the pipeline and autoplugging is done. 239 240 Elements are put in a new state using the _set_state function. This function 241 can return the following return values: 242 243 typedef enum { 244 GST_STATE_FAILURE = 0, 245 GST_STATE_PARTIAL = 1, 246 GST_STATE_SUCCESS = 2, 247 GST_STATE_ASYNC = 3 248 } GstElementStateReturn; 249 250 GST_STATE_FAILURE is returned when the element failed to go to the 251 required state. When dealing with a bin, this is returned when one 252 of the elements failed to go to the required state. The other elements 253 in the bin might have changed their states successfully. This return 254 value means that the element did _not_ change state, for bins this 255 means that not all children have changed their state. 256 257 GST_STATE_PARTIAL is returned when some elements in a bin where in the 258 locked state and therefore did not change their state. Note that the 259 state of the bin will be changed regardless of this PARTIAL return value. 260 261 GST_STATE_SUCCES is returned when all the elements successfully changed their 262 states. 263 264 GST_STATE_ASYNC is returned when an element is going to report the success 265 or failure of a state change later. 266 267 The state of a bin is not related to the state of its children but only to 268 the last state change directly performed on the bin or on a parent bin. This 269 means that changing the state of an element inside the bin does not affect 270 the state of the bin. 271 272 Setting the state on a bin that is already in the correct state will 273 perform the requested state change on the children. 274 275 Elements are not allowed to change their own state. For bins, it is allowed 276 to change the state of its children. This means that the application 277 can only know about the states of the elements it has explicitly set. 278 279 There is a difference in the way a pipeline and a bin handles the state 280 change of its children: 281 282 - a bin returns GST_STATE_ASYNC when one of its children returns an 283 ASYNC reply. 284 285 - a pipeline never returns GST_STATE_ASYNC but returns from the state 286 change function after all ASYNC elements completed the state change. 287 This is done by polling the ASYNC elements until they return their 288 final state. 289 290 The state change function must be fast an cannot block. If a blocking behaviour 291 is unavoidable, the state change function must perform an async state change. 292 Sink elements therefore always use async state changes since they need to 293 wait before the first buffer arrives at the sink. 294 295 A bin has to change the state of its children elements from the sink to the 296 source elements. This makes sure that a sink element is always ready to 297 receive data from a source element in the case of a READY->PAUSED state change. 298 In the case of a PAUSED->READY state, the sink element will be set to READY 299 first so that the source element will receive an error when it tries to push 300 data to this element so that it will shut down as well. 301 302 For loop based elements we have to be careful since they can pull a buffer 303 from the peer element before it has been put in the right state. 304 The state of a loop based element is therefore only changed after the source 305 element has been put in the new state. 306 307 c) Element state change functions 308 309 The core will call the change_state function of an element with the element 310 lock held. The element is responsible for starting any streaming tasks/threads 311 and making sure that it synchronizes them to the state change function if 312 needed. 313 314 This means that no other thread is allowed to change the state of the element 315 at that time and for bins, it is not possible to add/remove elements. 316 317 When an element is busy doing the ASYNC state change, it is possible that another 318 state change happens. The elements should be prepared for this. 319 320 An element can receive a state change for the same state it is in. This 321 is not a problem, some elements (like bins) use this to resynchronize their 322 children. Other elements should ignore this state change and return SUCCESS. 323 324 When performing a state change on an element that returns ASYNC on one of 325 the state changes, ASYNC is returned and you can only proceed to the next 326 state change when this ASYNC state change completed. Use the 327 gst_element_get_state function to know when the state change completed. 328 An example of this behaviour is setting a videosink to PLAYING, it will 329 return ASYNC in the state change from READY->PAUSED. You can only set 330 it to PLAYING when this state change completes. 331 332 Bins will perform the state change code listed in d). 333 334 For performing the state change, two variables are used: the current state 335 of the element and the pending state. When the element is not performing a 336 state change, the pending state == None. The state change variables are 337 protected by the element lock. The pending state != None as long as the 338 state change is performed or when an ASYNC state change is running. 339 340 The core provides the following function for applications and bins to 341 get the current state of an element: 342 343 bool gst_element_get_state(&state, &pending, timeout); 344 345 This function will block while the state change function is running inside 346 the element because it grabs the element lock. 347 When the element did not perform an async state change, this function returns 348 TRUE immediately with the state updated to reflect the current state of the 349 element and pending set to None. 350 When the element performed an async state change, this function will block 351 for the value of timeout and will return TRUE if the element completed the 352 async state change within that timeout, otherwise it returns FALSE, with 353 the current and pending state filled in. 354 355 The algorithm is like this: 356 357 bool gst_element_get_state(elem, &state, &pending, timeout) 358 { 359 g_mutex_lock (ELEMENT_LOCK); 360 if (elem->pending != none) { 361 if (!g_mutex_cond_wait(STATE, ELEMENT_LOCK, timeout) { 362 /* timeout triggered */ 363 *state = elem->state; 364 *pending = elem->pending; 365 ret = FALSE; 366 } 367 } 368 if (elem->pending == none) { 369 *state = elem->state; 370 *pending = none; 371 ret = TRUE; 372 } 373 g_mutex_unlock (ELEMENT_LOCK); 374 375 return ret; 376 } 377 378 For plugins the following function is provided to commit the pending state, 379 the ELEMENT_LOCK should be held when calling this function: 380 381 gst_element_commit_state(element) 382 { 383 if (pending != none) { 384 state = pending; 385 pending = none; 386 } 387 g_cond_broadcast (STATE); 388 } 389 390 For bins the gst_element_get_state() works slightly different. It will run 391 the function on all of its children, as soon as one of the children returns 392 FALSE, the method returns FALSE with the state set to the current bin state 393 and the pending set to pending state. 394 395 For bins with elements that did an ASYNC state change, the _commit_state() 396 is only executed when actively calling _get_state(). The reason for this is 397 that when a child of the bin commits its state, this is not automatically 398 reported to the bin. This is not a problem since the _get_state() function 399 is the only way to get the current and pending state of the bin and is always 400 consistent. 401 402 d) bin state change algorithm 403 404 In order to perform the sink to source state change a bin must be able to sort 405 the elements. To make this easier we require that elements are connected to 406 bins using ghostpads on the bin. 407 408 The algorithm goes like this: 409 410 d = [ ] # list of delayed elements 411 p = [ ] # list of pending async elements 412 q = [ elements without srcpads ] # sinks 413 while q not empty do 414 e = dequeue q 415 s = [ all elements connected to e on the sinkpads ] 416 q = q append s 417 if e is entry point 418 d = d append e 419 else 420 r = state change e 421 if r is ASYNC 422 p = p append e 423 done 424 while d not empty do 425 e = dequeue d 426 r = state change e 427 if r is ASYNC 428 p = p append e 429 done 430 # a bin would return ASYNC here if p is not empty 431 432 # this last part is only performed by a pipeline 433 while p not empty do 434 e = peek p 435 if state completed e 436 dequeue e from p 437 done 438 439 The algorithm first tries to find the sink elements, ie. ones without 440 sinkpads. Then it changes the state of each sink elements and queues 441 the elements connected to the sinkpads. 442 443 The entry points (loopbased and getbased elements) are delayed as we 444 first need to change the state of the other elements before we can activate 445 the entry points in the pipeline. 446 447 The pipeline will poll the async children before returning. 448 449 e) The GstTask infrastructure 450 451 A new component: GstTask is added to the core. A task is created by 452 an instance of the abstract GstScheduler class. 453 454 Each schedulable element (when added to a pipeline) is handed a 455 reference to a GstScheduler. It can use this object to create 456 a GstTask, which is basically a managed wrapper around a threading 457 library like GThread. It should be possible to write a GstScheduler 458 instance that uses other means of scheduling, like one that does not 459 use threads but implements task switching based on mutex locking. 460 461 When source and loopbased elements want to create the streaming thread 462 they create an instance of a GstTask, which they pass a pointer to 463 a loop-function. This function will be called as soon as the element 464 performs GstTask.start(). The element can stop and uses mutexes to 465 pause the GstTask from, for example, the state change function or the 466 event functions. 467 468 The GstTasks implement the streaming threads. 469 470 f) the preroll phase 471 472 Element start the streaming threads in the READY->PAUSED state. Since 473 the elements that start the threads are put in the PAUSED state last, 474 after their connected elements, they will be able to deliver data to 475 their peers without problems. 476 477 Sink elements like audio and videosinks will return an async state change 478 reply and will only commit the state change after receiving the first 479 buffer. This will implement the preroll phase. 480 481 The following pseudo code shows an algorithm for committing the state 482 change in the streaming method. 483 484 GST_OBJECT_LOCK (element); 485 /* if we are going to PAUSED, we can commit the state change */ 486 if (GST_STATE_TRANSITION (element) == GST_STATE_READY_TO_PAUSED) { 487 gst_element_commit_state (element); 488 } 489 /* if we are paused we need to wait for playing to continue */ 490 if (GST_STATE (element) == GST_STATE_PAUSED) { 491 492 /* here we wait for the next state change */ 493 do { 494 g_cond_wait (element->state_cond, GST_OBJECT_GET_LOCK (element)); 495 } while (GST_STATE (element) == GST_STATE_PAUSED); 496 497 /* check if we got playing */ 498 if (GST_STATE (element) != GST_STATE_PLAYING) { 499 /* not playing, we can't accept the buffer */ 500 GST_OBJECT_UNLOCK (element); 501 gst_buffer_unref (buf); 502 return GST_FLOW_WRONG_STATE; 503 } 504 } 505 GST_OBJECT_UNLOCK (element); 506 507 508 509 g) return values for push/pull 510 511 To recover from pipeline errors in a more elegant manner than just 512 shutting down the pipeline, we need more finegrained error messages 513 in the data transport. The plugins should be able to know what goes 514 wrong when interacting with their outside environment. This means 515 that gst_pad_push/gst_pad_pull and gst_event_send should return a 516 result code. 517 518 Possible return values include: 519 520 - GST_OK 521 - GST_ERROR 522 - GST_NOT_CONNECTED 523 - GST_NOT_NEGOTIATED 524 - GST_WRONG_STATE 525 - GST_UNEXPECTED 526 - GST_NOT_SUPPORTED 527 528 GST_OK 529 Data transport was successful 530 531 GST_ERROR 532 An error occurred during transport, such as a fatal decoding error, 533 the pad should not be used again. 534 535 GST_NOT_CONNECTED 536 The pad was not connected 537 538 GST_NOT_NEGOTIATED 539 The peer does not know what datatype is going over the pipeline. 540 541 GST_WRONG_STATE 542 The peer pad is not in the correct state. 543 544 GST_UNEXPECTED 545 The peer pad did not expect the data because it was flushing or 546 received an eos. 547 548 GST_NOT_SUPPORTED 549 The operation is not supported. 550 551 The signatures of the functions will become: 552 553 GstFlowReturn gst_pad_push (GstPad *pad, GstBuffer *buffer); 554 GstFlowReturn gst_pad_pull (GstPad *pad, GstBuffer **buffer); 555 556 GstResult gst_pad_push_event (GstPad *pad, GstEvent *event); 557 558 - push_event will send the event to the connected pad. 559 560 For sending events from the application: 561 562 GstResult gst_pad_send_event (GstPad *pad, GstEvent *event); 563 564 h) Negotiation 565 566 Implement a simple two phase negotiation. First the source queries the 567 sink if it accepts a certain format, then it sends the new format 568 as an event. Sink pads can also trigger a state change by requesting 569 a renegotiation. 570 571 i) Mainloop integration/GstBus 572 573 All error, warning and EOS messages from the plugins are sent to an event 574 queue. The pipeline reads the messages from the queue and will either 575 handle them or forward them to the main event queue that is read by the 576 application. 577 578 Specific pipelines can be written that deal with negotiation messages and 579 errors in the pipeline intelligently. The basic pipeline will stop the 580 pipeline when an error occurs. 581 582 Whenever an element posts a message on the event queue, a signal is also 583 fired that can be caught by the application. When dealing with those 584 signals the application has to be aware that they come from the streaming 585 threads and need to make sure they use proper locking to protect their 586 own data structures. 587 588 The messages will be implemented using a GstBus object that allows 589 plugins to post messages and allows the application to read messages either 590 synchronous or asynchronous. It is also possible to integrate the bus in 591 the mainloop. 592 593 The messages will derive from GstData to make them a lightweight refcounted 594 object. Need to figure out how we can extend this method to encapsulate 595 generic signals in messages too. 596 597 This decouples the streaming thread from the application thread and should 598 avoid race conditions and pipeline stalling due to application interaction. 599 600 It is still possible to receive the messages in the streaming thread context 601 if an application wants to. When doing this, special care has to be taken 602 when performing state changes. 603 604 j) EOS 605 606 When an element goes to EOS, it sends the EOS event to the peer plugin 607 and stops sending data on that pad. The peer element that received an EOS 608 event on a pad can refuse any buffers on that pad. 609 610 All elements without source pads must post the EOS message on the message 611 queue. When the pipeline receives an EOS event from all sinks, it will 612 post the EOS message on the application message queue so that the application 613 knows the pipeline is in EOS. Elements without any connected sourcepads 614 should also post the EOS message. This makes sure that all "dead-ends" 615 signalled the EOS. 616 617 No state change happens when elements go to EOS but the elements with the 618 GstTask will stop their tasks and so stop producing data. 619 620 An application can issue a seek operation which makes all tasks running 621 again so that they can start streaming from the new location. 622 623 624 625A) threads and lowlatency 626 627 People often think it is a sin to use threads in low latency applications. This is true 628 when using the data has to pass thread boundaries but false when it doesn't. Since 629 source and loop based elements create a thread, it is possible to construct a pipeline 630 where data passing has to cross thread boundaries, consider this case: 631 632 +-----------------------------------+ 633 | +--------+ +--------+ | 634 | |element1| |element2| | 635 | .. -sink src-sink src- .. | 636 | +--------+ +--------+ | 637 +-----------------------------------+ 638 639 The two elements are loop base and thus create a thread to drive the pipeline. At the 640 border between the two elements there is a mutex to pass the data between the two 641 threads. When using these kinds of element in a pipeline, low-latency will not be 642 possible. For low-latency apps, don't use these constructs! 643 644 Note that in a typical pipeline with one get-based element and two chain-based 645 elements (decoder/sink) there is only one thread, no data is crossing thread 646 boundaries and thus this pipeline can be low-latency. Also note that while this 647 pipeline is streaming no interaction or locking is done between it and the main 648 application. 649 650 +-------------------------------------+ 651 | +--------+ +---------+ +------+ | 652 | | src | | decoder | | sink | | 653 | | src-sink src-sink | | 654 | +--------+ +---------+ +------+ | 655 +-------------------------------------+ 656 657 658B) howto make non-threaded pipelines 659 660 For low latency it is required to not have datapassing cross any thread 661 borders. Here are some pointers for making sure this requirement is met: 662 663 - never connect a loop or chain based element to a loop based element, this 664 will create a new thread for the sink loop element. 665 666 - do not use queues or any other decoupled element, as they implicitly 667 create a thread boundary. 668 669 - At least one thread will be created for any source element (either in the 670 connected loop-based element or in the source itself) unless the source 671 elements are connected to the same loop based element. 672 673 - when designing sinks, make them non-blocking, use the async clock callbacks 674 to schedule media rendering in the same thread (if any) as the clock. Sinks that 675 provide the clock can be made blocking. 676 677