Lines Matching +full:frame +full:- +full:buffer
1 .. SPDX-License-Identifier: GPL-2.0
22 - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
23 - Johann Baudy
34 configurable circular buffer mapped in user space that can be used to either
38 highest bandwidth. By using a shared buffer between the kernel and the user
67 [setup] socket() -------> creation of the capture socket
68 setsockopt() ---> allocation of the circular buffer (ring)
70 mmap() ---------> mapping of the allocated buffer to the
73 [capture] poll() ---------> to wait for incoming packets
75 [shutdown] close() --------> destruction of the capture socket and
88 supported and a link level pseudo-header is provided
96 allocated RX and TX buffer ring with a single mmap() call.
97 See "Mapping and use of the circular buffer (ring)".
100 also the mapping of the circular buffer in the user process and
101 the use of this buffer.
107 [setup] socket() -------> creation of the transmission socket
108 setsockopt() ---> allocation of the circular buffer (ring)
110 bind() ---------> bind transmission socket with a network interface
111 mmap() ---------> mapping of the allocated buffer to the
114 [transmission] poll() ---------> wait for free packets (optional)
115 send() ---------> send all packets that are set as ready in
120 [shutdown] close() --------> destruction of the transmission socket and
134 know the header size of frames used in the circular buffer.
136 As capture, each frame contains two parts::
138 --------------------
140 | | of this frame
141 |--------------------|
142 | data buffer |
145 --------------------
159 ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
167 bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
174 frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
179 frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
182 the frame (for payload alignment with SOCK_RAW mode for instance) you
192 - Capture process::
196 - Transmission process::
207 unsigned int tp_frame_size; /* Size of frame */
212 circular buffer (ring) of unswappable memory.
214 related meta-information like timestamps without requiring a system call.
233 we will get the following buffer structure::
236 +---------+---------+ +---------+---------+
237 | frame 1 | frame 2 | | frame 3 | frame 4 |
238 +---------+---------+ +---------+---------+
241 +---------+---------+ +---------+---------+
242 | frame 5 | frame 6 | | frame 7 | frame 8 |
243 +---------+---------+ +---------+---------+
245 A frame can be of any size with the only condition it can fit in a block. A block
246 can only hold an integer number of frames, or in other words, a frame cannot
249 buffer (ring)".
255 the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or
259 ----------------
286 ------------------
294 +---+---+---+---+
296 +---+---+---+---+
305 a pool of pre-determined sizes. This pool of memory is maintained by the slab
310 predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
318 PACKET_MMAP buffer size calculator
324 <size-max> is the maximum size of allocable with kmalloc
326 <pointer size> depends on the architecture -- ``sizeof(void *)``
327 <page size> depends on the architecture -- PAGE_SIZE or getpagesize (2)
328 <max-order> is the value defined with MAX_PAGE_ORDER
329 <frame size> it's an upper bound of frame's capture size (more on this later)
334 <block number> = <size-max>/<pointer size>
335 <block size> = <pagesize> << <max-order>
337 so, the max buffer size is::
343 <block number> * <block size> / <frame size>
348 <size-max> = 131072 bytes
351 <max-order> = 11
353 and a value for <frame size> of 2048 bytes. These parameters will yield::
358 and hence the buffer will have a 262144 MiB size. So it can hold
361 Actually, this buffer size is not possible with an i386 architecture.
371 -----------------
373 If you check the source code you will see that what I draw here as a frame
374 is not only the link level frame. At the beginning of each frame there is a
375 header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
376 meta information like timestamp. So what we draw here a frame it's really
380 Frame structure:
382 - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
383 - struct tpacket_hdr
384 - pad to TPACKET_ALIGNMENT=16
385 - struct sockaddr_ll
386 - Gap, chosen so that packet data (Start+tp_net) aligns to
388 - Start+tp_mac: [ Optional MAC header ]
389 - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
390 - Pad to align to TPACKET_ALIGNMENT=16
395 - tp_block_size must be a multiple of PAGE_SIZE (1)
396 - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
397 - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
398 - tp_frame_nr must be exactly frames_per_block*tp_block_nr
403 Mapping and use of the circular buffer (ring)
404 ---------------------------------------------
406 The mapping of the buffer in the user process is done with the conventional
407 mmap function. Even the circular buffer is compound of several physically
416 the frames. This is because a frame cannot be spawn across two
420 RX and TX buffer ring has to be done with one call to mmap::
432 At the beginning of each frame there is an status field (see
433 struct tpacket_hdr). If this field is 0 means that the frame is ready
434 to be used for the kernel, If not, there is a frame the user can read
448 TP_STATUS_COPY This flag indicates that the frame (and associated
483 receives a packet it puts in the buffer and updates the status with
486 can use again that frame buffer.
508 #define TP_STATUS_AVAILABLE 0 // Frame is available
509 #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
510 #define TP_STATUS_SENDING 2 // Frame is currently in transmission
511 #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
514 packet, the user fills a data buffer of an available frame, sets tp_len to
515 current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
521 At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
525 header->tp_len = in_i_size;
526 header->tp_status = TP_STATUS_SEND_REQUEST;
527 retval = send(this->socket, NULL, 0, 0);
529 The user can also use poll() to check if a buffer is available:
553 - Default if not otherwise specified by setsockopt(2)
554 - RX_RING, TX_RING available
556 TPACKET_V1 --> TPACKET_V2:
557 - Made 64 bit clean due to unsigned long usage in TPACKET_V1
560 - Timestamp resolution in nanoseconds instead of microseconds
561 - RX_RING, TX_RING available
562 - VLAN metadata information available for packets
566 - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
568 - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
571 - How to switch to TPACKET_V2:
580 TPACKET_V2 --> TPACKET_V3:
581 - Flexible buffer implementation for RX_RING:
582 1. Blocks can be configured with non-static frame-size
583 2. Read/poll is at a block-level (as opposed to packet-level)
584 3. Added poll timeout to avoid indefinite user-space wait
586 4. Added user-configurable knobs:
591 - RX Hash data available in user space
592 - TX_RING semantics are conceptually similar to TPACKET_V2;
597 Packets with non-zero values of tp_next_offset will be dropped.
607 - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
608 - PACKET_FANOUT_LB: schedule to socket by round-robin
609 - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
610 - PACKET_FANOUT_RND: schedule to socket by random selection
611 - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
612 - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
692 while (limit-- > 0) {
740 case -1:
758 AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
764 * ~15% - 20% reduction in CPU-usage
768 * Non static frame size to capture entire packet payload
773 it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
775 /* Written from scratch, but kernel-to-user space API usage
845 memset(&ring->req, 0, sizeof(ring->req));
846 ring->req.tp_block_size = blocksiz;
847 ring->req.tp_frame_size = framesiz;
848 ring->req.tp_block_nr = blocknum;
849 ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
850 ring->req.tp_retire_blk_tov = 60;
851 ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
853 err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
854 sizeof(ring->req));
860 ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
862 if (ring->map == MAP_FAILED) {
867 ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
868 assert(ring->rd);
869 for (i = 0; i < ring->req.tp_block_nr; ++i) {
870 ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
871 ring->rd[i].iov_len = ring->req.tp_block_size;
893 struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
896 if (eth->h_proto == htons(ETH_P_IP)) {
902 ss.sin_addr.s_addr = ip->saddr;
908 sd.sin_addr.s_addr = ip->daddr;
912 printf("%s -> %s, ", sbuff, dbuff);
915 printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
920 int num_pkts = pbd->h1.num_pkts, i;
925 pbd->h1.offset_to_first_pkt);
927 bytes += ppd->tp_snaplen;
931 ppd->tp_next_offset);
940 pbd->h1.block_status = TP_STATUS_KERNEL;
945 munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
946 free(ring->rd);
968 fd = setup_socket(&ring, argp[argc - 1]);
979 if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
980 poll(&pfd, 1, -1);
1015 This has the side-effect, that packets sent through PF_PACKET will bypass the
1057 frames to be updated resp. the frame handed over to the application, iv) walk
1063 in a first step to see if the frame belongs to the application, and then
1078 - Packet sockets work well together with Linux socket filters, thus you also