1# Transport Explainer 2 3@vjpai 4 5## Existing Transports 6 7[gRPC 8transports](https://github.com/grpc/grpc/tree/master/src/core/ext/transport) 9plug in below the core API (one level below the C++ or other wrapped-language 10API). You can write your transport in C or C++ though; currently (Nov 2017) all 11the transports are nominally written in C++ though they are idiomatically C. The 12existing transports are: 13 14* [HTTP/2](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/chttp2) 15* [Cronet](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/cronet) 16* [In-process](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/inproc) 17 18Among these, the in-process is likely the easiest to understand, though arguably 19also the least similar to a "real" sockets-based transport since it is only used 20in a single process. 21 22## Transport stream ops 23 24In the gRPC core implementation, a fundamental struct is the 25`grpc_transport_stream_op_batch` which represents a collection of stream 26operations sent to a transport. (Note that in gRPC, _stream_ and _RPC_ are used 27synonymously since all RPCs are actually streams internally.) The ops in a batch 28can include: 29 30* send\_initial\_metadata 31 - Client: initiate an RPC 32 - Server: supply response headers 33* recv\_initial\_metadata 34 - Client: get response headers 35 - Server: accept an RPC 36* send\_message (zero or more) : send a data buffer 37* recv\_message (zero or more) : receive a data buffer 38* send\_trailing\_metadata 39 - Client: half-close indicating that no more messages will be coming 40 - Server: full-close providing final status for the RPC 41* recv\_trailing\_metadata: get final status for the RPC 42 - Server extra: This op shouldn't actually be considered complete until the 43 server has also sent trailing metadata to provide the other side with final 44 status 45* cancel\_stream: Attempt to cancel an RPC 46* collect\_stats: Get stats 47 48The fundamental responsibility of the transport is to transform between this 49internal format and an actual wire format, so the processing of these operations 50is largely transport-specific. 51 52One or more of these ops are grouped into a batch. Applications can start all of 53a call's ops in a single batch, or they can split them up into multiple 54batches. Results of each batch are returned asynchronously via a completion 55queue. 56 57Internally, we use callbacks to indicate completion. The surface layer creates a 58callback when starting a new batch and sends it down the filter stack along with 59the batch. The transport must invoke this callback when the batch is complete, 60and then the surface layer returns an event to the application via the 61completion queue. Each batch can have up to 3 callbacks: 62 63* recv\_initial\_metadata\_ready (called by the transport when the 64 recv\_initial\_metadata op is complete) 65* recv\_message\_ready (called by the transport when the recv_message op is 66 complete) 67* on\_complete (called by the transport when the entire batch is complete) 68 69## Timelines of transport stream op batches 70 71The transport's job is to sequence and interpret various possible interleavings 72of the basic stream ops. For example, a sample timeline of batches would be: 73 741. Client send\_initial\_metadata: Initiate an RPC with a path (method) and authority 751. Server recv\_initial\_metadata: accept an RPC 761. Client send\_message: Supply the input proto for the RPC 771. Server recv\_message: Get the input proto from the RPC 781. Client send\_trailing\_metadata: This is a half-close indicating that the 79 client will not be sending any more messages 801. Server recv\_trailing\_metadata: The server sees this from the client and 81 knows that it will not get any more messages. This won't complete yet though, 82 as described above. 831. Server send\_initial\_metadata, send\_message, send\_trailing\_metadata: A 84 batch can contain multiple ops, and this batch provides the RPC response 85 headers, response content, and status. Note that sending the trailing 86 metadata will also complete the server's receive of trailing metadata. 871. Client recv\_initial\_metadata: The number of ops in one side of the batch 88 has no relation with the number of ops on the other side of the batch. In 89 this case, the client is just collecting the response headers. 901. Client recv\_message, recv\_trailing\_metadata: Get the data response and 91 status 92 93 94There are other possible sample timelines. For example, for client-side streaming, a "typical" sequence would be: 95 961. Server: recv\_initial\_metadata 97 - At API-level, that would be the server requesting an RPC 981. Server: recv\_trailing\_metadata 99 - This is for when the server wants to know the final completion of the RPC 100 through an `AsyncNotifyWhenDone` API in C++ 1011. Client: send\_initial\_metadata, recv\_message, recv\_trailing\_metadata 102 - At API-level, that's a client invoking a client-side streaming call. The 103 send\_initial\_metadata is the call invocation, the recv\_message collects 104 the final response from the server, and the recv\_trailing\_metadata gets 105 the `grpc::Status` value that will be returned from the call 1061. Client: send\_message / Server: recv\_message 107 - Repeat the above step numerous times; these correspond to a client issuing 108 `Write` in a loop and a server doing `Read` in a loop until `Read` fails 1091. Client: send\_trailing\_metadata / Server: recv\_message that indicates doneness (NULL) 110 - These correspond to a client issuing `WritesDone` which causes the server's 111 `Read` to fail 1121. Server: send\_message, send\_trailing\_metadata 113 - These correspond to the server doing `Finish` 114 115The sends on one side will call their own callbacks when complete, and they will 116in turn trigger actions that cause the other side's recv operations to 117complete. In some transports, a send can sometimes complete before the recv on 118the other side (e.g., in HTTP/2 if there is sufficient flow-control buffer space 119available) 120 121## Other transport duties 122 123In addition to these basic stream ops, the transport must handle cancellations 124of a stream at any time and pass their effects to the other side. For example, 125in HTTP/2, this triggers a `RST_STREAM` being sent on the wire. The transport 126must perform operations like pings and statistics that are used to shape 127transport-level characteristics like flow control (see, for example, their use 128in the HTTP/2 transport). 129 130## Putting things together with detail: Sending Metadata 131 132* API layer: `map<string, string>` that is specific to this RPC 133* Core surface layer: array of `{slice, slice}` pairs where each slice 134 references an underlying string 135* [Core transport 136 layer](https://github.com/grpc/grpc/tree/master/src/core/lib/transport): list 137 of `{slice, slice}` pairs that includes the above plus possibly some general 138 metadata (e.g., Method and Authority for initial metadata) 139* [Specific transport 140 layer](https://github.com/grpc/grpc/tree/master/src/core/ext/transport): 141 - Either send it to the other side using transport-specific API (e.g., Cronet) 142 - Or have it sent through the [iomgr/endpoint 143 layer](https://github.com/grpc/grpc/tree/master/src/core/lib/iomgr) (e.g., 144 HTTP/2) 145 - Or just manipulate pointers to get it from one side to the other (e.g., 146 In-process) 147 148## Requirements for any transport 149 150Each transport implements several operations in a vtbl (may change to actual 151virtual functions as transport moves to idiomatic C++). 152 153The most important and common one is `perform_stream_op`. This function 154processes a single stream op batch on a specific stream that is associated with 155a specific transport: 156 157* Gets the 6 ops/cancel passed down from the surface 158* Pass metadata from one side to the other as described above 159* Transform messages between slice buffer structure and stream of bytes to pass 160 to other side 161 - May require insertion of extra bytes (e.g., per-message headers in HTTP/2) 162* React to metadata to preserve expected orderings (*) 163* Schedule invocation of completion callbacks 164 165There are other functions in the vtbl as well. 166 167* `perform_transport_op` 168 - Configure the transport instance for the connectivity state change notifier 169 or the server-side accept callback 170 - Disconnect transport or set up a goaway for later streams 171* `init_stream` 172 - Starts a stream from the client-side 173 - (*) Server-side of the transport must call `accept_stream_cb` when a new 174 stream is available 175 * Triggers request-matcher 176* `destroy_stream`, `destroy_transport` 177 - Free up data related to a stream or transport 178* `set_pollset`, `set_pollset_set`, `get_endpoint` 179 - Map each specific instance of the transport to FDs being used by iomgr (for 180 HTTP/2) 181 - Get a pointer to the endpoint structure that actually moves the data 182 (wrapper around a socket for HTTP/2) 183 184## Book-keeping responsibilities of the transport layer 185 186A given transport must keep all of its transport and streams ref-counted. This 187is essential to make sure that no struct disappears before it is done being 188used. 189 190A transport must also preserve relevant orders for the different categories of 191ops on a stream, as described above. A transport must also make sure that all 192relevant batch operations have completed before scheduling the `on_complete` 193closure for a batch. Further examples include the idea that the server logic 194expects to not complete recv\_trailing\_metadata until after it actually sends 195trailing metadata since it would have already found this out by seeing a NULL’ed 196recv\_message. This is considered part of the transport's duties in preserving 197orders. 198