1## Threadpool 2 3### Overview 4 5![overview](/doc-assets/threadpool.svg) 6 7An api that lets you create a pool of worker threads, and a queue of tasks that 8are bound to a wsi. Tasks in their own thread synchronize communication to the 9lws service thread of the wsi via `LWS_CALLBACK_SERVER_WRITEABLE` and friends. 10 11Tasks can produce some output, then return that they want to "sync" with the 12service thread. That causes a `LWS_CALLBACK_SERVER_WRITEABLE` in the service 13thread context, where the output can be consumed, and the task told to continue, 14or completed tasks be reaped. 15 16ALL of the details related to thread synchronization and an associated wsi in 17the lws service thread context are handled by the threadpool api, without needing 18any pthreads in user code. 19 20### Example 21 22https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-threadpool 23 24### Lifecycle considerations 25 26#### Tasks vs wsi 27 28Although all tasks start out as being associated to a wsi, in fact the lifetime 29of a task and that of the wsi are not necessarily linked. 30 31You may start a long task, eg, that runs atomically in its thread for 30s, and 32at any time the client may close the connection, eg, close a browser window. 33 34There are arrangements that a task can "check in" periodically with lws to see 35if it has been asked to stop, allowing the task lifetime to be related to the 36wsi lifetime somewhat, but some tasks are going to be atomic and longlived. 37 38For that reason, at wsi close an ongoing task can detach from the wsi and 39continue until it ends or understands it has been asked to stop. To make 40that work, the task is created with a `cleanup` callback that performs any 41freeing independent of still having a wsi around to do it... the task takes over 42responsibility to free the user pointer on destruction when the task is created. 43 44![Threadpool States](/doc-assets/threadpool-states.svg) 45 46#### Reaping completed tasks 47 48Once created, although tasks may run asynchronously, the task itself does not 49get destroyed on completion but added to a "done queue". Only when the lws 50service thread context queries the task state with `lws_threadpool_task_status()` 51may the task be reaped and memory freed. 52 53This is analogous to unix processes and `wait()`. 54 55If a task became detached from its wsi, then joining the done queue is enough 56to get the task reaped, since there's nobody left any more to synchronize the 57reaping with. 58 59### User interface 60 61The api is declared at https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-threadpool.h 62 63#### Threadpool creation / destruction 64 65The threadpool should be created at program or vhost init using 66`lws_threadpool_create()` and destroyed on exit or vhost destruction using 67first `lws_threadpool_finish()` and then `lws_threadpool_destroy()`. 68 69Threadpools should be named, varargs are provided on the create function 70to facilite eg, naming the threadpool by the vhost it's associated with. 71 72Threadpool creation takes an args struct with the following members: 73 74Member|function 75---|--- 76threads|The maxiumum number of independent threads in the pool 77max_queue_depth|The maximum number of tasks allowed to wait for a place in the pool 78 79#### Task creation / destruction 80 81Tasks are created and queued using `lws_threadpool_enqueue()`, this takes an 82args struct with the following members 83 84Member|function 85---|--- 86wsi|The wsi the task is initially associated with 87user|An opaque user-private pointer used for communication with the lws service thread and private state / data 88task|A pointer to the function that will run in the pool thread 89cleanup|A pointer to a function that will clean up finished or stopped tasks (perhaps freeing user) 90 91Tasks also should have a name, the creation function again provides varargs 92to simplify naming the task with string elements related to who started it 93and why. 94 95#### The task function itself 96 97The task function receives the task user pointer and the task state. The 98possible task states are 99 100State|Meaning 101---|--- 102LWS_TP_STATUS_QUEUED|Task is still waiting for a pool thread 103LWS_TP_STATUS_RUNNING|Task is supposed to do its work 104LWS_TP_STATUS_SYNCING|Task is blocked waiting for sync from lws service thread 105LWS_TP_STATUS_STOPPING|Task has been asked to stop but didn't stop yet 106LWS_TP_STATUS_FINISHED|Task has reported it has completed 107LWS_TP_STATUS_STOPPED|Task has aborted 108 109The task function will only be told `LWS_TP_STATUS_RUNNING` or 110`LWS_TP_STATUS_STOPPING` in its status argument... RUNNING means continue with the 111user task and STOPPING means clean up and return `LWS_TP_RETURN_STOPPED`. 112 113If possible every 100ms or so the task should return `LWS_TP_RETURN_CHECKING_IN` 114to allow lws to inform it reasonably quickly that it has been asked to stop 115(eg, because the related wsi has closed), or if it can continue. If not 116possible, it's okay but eg exiting the application may experience delays 117until the running task finishes, and since the wsi may have gone, the work 118is wasted. 119 120The task function may return one of 121 122Return|Meaning 123---|--- 124LWS_TP_RETURN_CHECKING_IN|Still wants to run, but confirming nobody asked him to stop. Will be called again immediately with `LWS_TP_STATUS_RUNNING` or `LWS_TP_STATUS_STOPPING` 125LWS_TP_RETURN_SYNC|Task wants to trigger a WRITABLE callback and block until lws service thread restarts it with `lws_threadpool_task_sync()` 126LWS_TP_RETURN_FINISHED|Task has finished, successfully as far as it goes 127LWS_TP_RETURN_STOPPED|Task has finished, aborting in response to a request to stop 128 129The SYNC or CHECKING_IN return may also have a flag `LWS_TP_RETURN_FLAG_OUTLIVE` 130applied to it, which indicates to threadpool that this task wishes to remain 131unstopped after the wsi closes. This is useful in the case where the task 132understands it will take a long time to complete, and wants to return a 133complete status and maybe close the connection, perhaps with a token identifying 134the task. The task can then be monitored separately by using the token. 135 136#### Synchronizing 137 138The task can choose to "SYNC" with the lws service thread, in other words 139cause a WRITABLE callback on the associated wsi in the lws service thread 140context and block itself until it hears back from there via 141`lws_threadpool_task_sync()` to resume the task. 142 143This is typically used when, eg, the task has filled its buffer, or ringbuffer, 144and needs to pause operations until what's done has been sent and some buffer 145space is open again. 146 147In the WRITABLE callback, in lws service thread context, the buffer can be 148sent with `lws_write()` and then `lws_threadpool_task_sync()` to allow the task 149to fill another buffer and continue that way. 150 151If the WRITABLE callback determines that the task should stop, it can just call 152`lws_threadpool_task_sync()` with the second argument as 1, to force the task 153to stop immediately after it resumes. 154 155#### The cleanup function 156 157When a finished task is reaped, or a task that become detached from its initial 158wsi completes or is stopped, it calls the `.cleanup` function defined in the 159task creation args struct to free anything related to the user pointer. 160 161With threadpool, responsibility for freeing allocations used by the task belongs 162strictly with the task, via the `.cleanup` function, once the task has been 163enqueued. That's different from a typical non-threadpool protocol where the 164wsi lifecycle controls deallocation. This reflects the fact that the task 165may outlive the wsi. 166 167#### Protecting against WRITABLE and / or SYNC duplication 168 169Care should be taken than data prepared by the task thread in the user priv 170memory should only be sent once. For example, after sending data from a user 171priv buffer of a given length stored in the priv, zero down the length. 172 173Task execution and the SYNC writable callbacks are mutually exclusive, so there 174is no danger of collision between the task thread and the lws service thread if 175the reason for the callback is a SYNC operation from the task thread. 176 177### Thread overcommit 178 179If the tasks running on the threads are ultimately network-bound for all or some 180of their processing (via the SYNC with the WRITEABLE callback), it's possible 181to overcommit the number of threads in the pool compared to the number of 182threads the processor has in hardware to get better occupancy in the CPU. 183