1//// 2 vim.syntax: asciidoc 3 4 Copyright (c) 2011 Thomas Graf <tgraf@suug.ch> 5//// 6 7Routing Family Netlink Library (libnl-route) 8============================================ 9Thomas Graf <tgraf@suug.ch> 103.1, Aug 11 2011: 11 12== Introduction 13 14This library provides APIs to the kernel interfaces of the routing family. 15 16 17NOTE: Work in progress. 18 19== Addresses 20 21[[route_link]] 22== Links (Network Devices) 23 24The link configuration interface is part of the +NETLINK_ROUTE+ protocol 25family and implements the following netlink message types: 26 27- View and modify the configuration of physical and virtual network devices. 28- Create and delete virtual network devices (e.g. dummy devices, VLAN devices, 29 tun devices, bridging devices, ...) 30- View and modify per link network configuration settings (e.g. 31 +net.ipv6.conf.eth0.accept_ra+, +net.ipv4.conf.eth1.forwarding+, ...) 32 33.Naming Convention (network device, link, interface) 34 35In networking several terms are commonly used to refer to network devices. 36While they have distinct meanings they have been used interchangeably in 37the past. Within the Linux kernel, the term _network device_ or _netdev_ is 38commonly used In user space the term _network interface_ is very common. 39The routing netlink protocol uses the term _link_ and so does the _iproute2_ 40utility and most routing daemons. 41 42=== Netlink Protocol 43 44This section describes the protocol semantics of the netlink based link 45configuration interface. The following messages are defined: 46 47[options="header", cols="1,2,2"] 48|============================================================================== 49| Message Type | User -> Kernel | Kernel -> User 50| +RTM_NEWLINK+ | Create or update virtual network device 51| Reply to +RTM_GETLINK+ request or notification of link added or updated 52| +RTM_DELLINK+ | Delete virtual network device 53| Notification of link deleted or disappeared 54| +RTM_GETLINK+ | Retrieve link configuration and statistics | 55| +RTM_SETLINK+ | Modify link configuration | 56|============================================================================== 57 58See link:core.html#core_msg_types[Netlink Library - Message Types] for more 59information on common semantics of these message types. 60 61==== Link Message Format 62 63All netlink link messages share a common header (+struct ifinfomsg+) which 64is appended after the netlink header (+struct nlmsghdr+). 65 66image:ifinfomsg.png["Link Message Header"] 67 68The meaning of each field may differ depending on the message type. A 69+struct ifinfomsg+ is defined in +<linux/rtnetlink.h>+ to represent the 70header. 71 72Address Family (8bit):: 73The address family is usually set to +AF_UNSPEC+ but may be specified in 74+RTM_GETLINK+ requests to limit the returned links to a specific address 75family. 76 77Link Layer Type (16bit):: 78Currently only used in kernel->user messages to report the link layer type 79of a link. The value corresponds to the +ARPHRD_*+ defines found in 80+<linux/if_arp.h>+. Translation from/to strings can be done using the 81functions nl_llproto2str()/nl_str2llproto(). 82 83Link Index (32bit):: 84Carries the interface index and is used to identify existing links. 85 86Flags (32bit):: 87In kernel->user messages the value of this field represents the current 88state of the link flags. In user->kernel messages this field is used to 89change flags or set the initial flag state of new links. Note that in order 90to change a flag, the flag must also be set in the _Flags Change Mask_ field. 91 92Flags Change Mask (32bit):: 93The primary use of this field is to specify a mask of flags that should be 94changed based on the value of the _Flags_ field. A special meaning is given 95to this field when present in link notifications, see TODO. 96 97Attributes (variable):: 98All link message types may carry netlink attributes. They are defined in the 99header file <linux/if_link.h> and share the prefix +IFLA_+. 100 101==== Link Message Types 102 103.RTM_GETLINK (user->kernel) 104 105Lookup link by 1. interface index or 2. link name (+IFLA_IFNAME+) and return 106a single +RTM_NEWLINK+ message containing the link configuration and statistics 107or a netlink error message if no such link was found. 108 109*Parameters:* 110 111* *Address family* 112** If the address family is set to +PF_BRIDGE+, only bridging devices will be 113 returned. 114** If the address family is set to +PF_INET6+, only ipv6 enabled devices will 115 be returned. 116 117*Flags:* 118 119* +NLM_F_DUMP+ If set, all links will be returned in form of a multipart 120 message. 121 122*Returns:* 123 124* +EINVAL+ if neither interface nor link name are set 125* +ENODEV+ if no link was found 126* +ENOBUFS+ if allocation failed 127 128.RTM_NEWLINK (user->kernel) 129 130Creates a new or updates an existing link. Only virtual links may be created 131but all links may be updated. 132 133*Flags:* 134 135- +NLM_F_CREATE+ Create link if it does not exist 136- +NLM_F_EXCL+ Return +EEXIST+ if link already exists 137 138*Returns:* 139 140- +EINVAL+ malformed message or invalid configuration parameters 141- +EAFNOSUPPORT+ if a address family specific configuration (+IFLA_AF_SPEC+) 142 is not supported. 143- +EOPNOTSUPP+ if the link does not support modification of parameters 144- +EEXIST+ if +NLM_F_EXCL+ was set and the link exists alraedy 145- +ENODEV+ if the link does not exist and +NLM_F_CREATE+ is not set 146 147.RTM_NEWLINK (kernel->user) 148 149This message type is used in reply to a +RTM_GETLINK+ request and carries 150the configuration and statistics of a link. If multiple links need to 151be sent, the messages will be sent in form of a multipart message. 152 153The message type is also used for notifications sent by the kernel to the 154multicast group +RTNLGRP_LINK+ to inform about various link events. It is 155therefore recommended to always use a separate link socket for link 156notifications in order to separate between the two message types. 157 158TODO: document how to detect different notifications 159 160.RTM_DELLINK (user->kernel) 161 162Lookup link by 1. interface index or 2. link name (+IFLA_IFNAME+) and delete 163the virtual link. 164 165*Returns:* 166 167* +EINVAL+ if neither interface nor link name are set 168* +ENODEV+ if no link was found 169* +ENOTSUPP+ if the operation is not supported (not a virtual link) 170 171.RTM_DELLINK (kernel->user) 172 173Notification sent by the kernel to the multicast group +RTNLGRP_LINK+ when 174 175a. a network device was unregistered (change == ~0) 176b. a bridging device was deleted (address family will be +PF_BRIDGE+) 177 178=== Get / List 179 180[[link_list]] 181==== Get list of links 182 183To retrieve the list of links in the kernel, allocate a new link cache 184using +rtnl_link_alloc_cache()+ to hold the links. It will automatically 185construct and send a +RTM_GETLINK+ message requesting a dump of all links 186from the kernel and feed the returned +RTM_NEWLINK+ to the internal link 187message parser which adds the returned links to the cache. 188 189[source,c] 190----- 191#include <netlink/route/link.h> 192 193int rtnl_link_alloc_cache(struct nl_sock *sk, int family, struct nl_cache **result) 194----- 195 196The cache will contain link objects (+struct rtnl_link+, see <<link_object>>) 197and can be accessed using the standard cache functions. By setting the 198+family+ parameter to an address familly other than +AF_UNSPEC+, the resulting 199cache will only contain links supporting the specified address family. 200 201The following direct search functions are provided to search by interface 202index and by link name: 203 204[source,c] 205----- 206#include <netlink/route/link.h> 207 208struct rtnl_link *rtnl_link_get(struct nl_cache *cache, int ifindex); 209struct rtnl_link *rtnl_link_get_by_name(struct nl_cache *cache, const char *name); 210----- 211 212.Example: Link Cache 213 214[source,c] 215----- 216struct nl_cache *cache; 217struct rtnl_link *link; 218 219if (rtnl_link_alloc_cache(sock, AF_UNSPEC, &cache)) < 0) 220 /* error */ 221 222if (!(link = rtnl_link_get_by_name(cache, "eth1"))) 223 /* link does not exist */ 224 225/* do something with link */ 226 227rtnl_link_put(link); 228nl_cache_put(cache); 229----- 230 231[[link_direct_lookup]] 232==== Lookup Single Link (Direct Lookup) 233 234If only a single link is of interest, the link can be looked up directly 235without the use of a link cache using the function +rtnl_link_get_kernel()+. 236 237[source,c] 238----- 239#include <netlink/route/link.h> 240 241int rtnl_link_get_kernel(struct nl_sock *sk, int ifindex, const char *name, struct rtnl_link **result); 242----- 243 244It will construct and send a +RTM_GETLINK+ request using the parameters 245provided and wait for a +RTM_NEWLINK+ or netlink error message sent in 246return. If the link exists, the link is returned as link object 247(see <<link_object>>). 248 249.Example: Direct link lookup 250[source,c] 251----- 252struct rtnl_link *link; 253 254if (rtnl_link_get_kernel(sock, 0, "eth1", &link) < 0) 255 /* error */ 256 257/* do something with link */ 258 259rtnl_link_put(link); 260----- 261 262NOTE: While using this function can save a substantial amount of bandwidth 263 on the netlink socket, the result will not be cached, subsequent calls 264 to rtnl_link_get_kernel() will always trigger sending a +RTM_GETLINK+ 265 request. 266 267[[link_translate_ifindex]] 268==== Translating interface index to link name 269 270Applications which require to translate interface index to a link name or 271vice verase may use the following functions to do so. Both functions require 272a filled link cache to work with. 273 274[source,c] 275----- 276char *rtnl_link_i2name (struct nl_cache *cache, int ifindex, char *dst, size_t len); 277int rtnl_link_name2i (struct nl_cache *cache, const char *name); 278----- 279 280=== Add / Modify 281 282Several types of virtual link can be added on the fly using the function 283+rtnl_link_add()+. 284 285[source,c] 286----- 287#include <netlink/route/link.h> 288 289int rtnl_link_add(struct nl_sock *sk, struct rtnl_link *link, int flags); 290----- 291 292=== Delete 293 294The deletion of virtual links such as VLAN devices or dummy devices is done 295using the function +rtnl_link_delete()+. The link passed on to the function 296can be a link from a link cache or it can be construct with the minimal 297attributes needed to identify the link. 298 299[source,c] 300----- 301#include <netlink/route/link.h> 302 303int rtnl_link_delete(struct nl_sock *sk, const struct rtnl_link *link); 304----- 305 306The function will construct and send a +RTM_DELLINK+ request message and 307returns any errors returned by the kernel. 308 309.Example: Delete link by name 310[source,c] 311----- 312struct rtnl_link *link; 313 314if (!(link = rtnl_link_alloc())) 315 /* error */ 316 317rtnl_link_set_name(link, "my_vlan"); 318 319if (rtnl_link_delete(sock, link) < 0) 320 /* error */ 321 322rtnl_link_put(link); 323----- 324 325[[link_object]] 326=== Link Object 327 328A link is represented by the structure +struct rtnl_link+. Instances may be 329created with the function +rtnl_link_alloc()+ or via a link cache (see 330<<link_list>>) and are freed again using the function +rtnl_link_put()+. 331 332[source,c] 333----- 334#include <netlink/route/link.h> 335 336struct rtnl_link *rtnl_link_alloc(void); 337void rtnl_link_put(struct rtnl_link *link); 338----- 339 340[[link_attr_name]] 341==== Name 342The name serves as unique, human readable description of the link. By 343default, links are named based on their type and then enumerated, e.g. 344eth0, eth1, ethn but they may be renamed at any time. 345 346Kernels >= 2.6.11 support identification by link name. 347 348[source,c] 349----- 350#include <netlink/route/link.h> 351 352void rtnl_link_set_name(struct rtnl_link *link, const char *name); 353char *rtnl_link_get_name(struct rtnl_link *link); 354----- 355 356*Accepted link name format:* +[^ /]*+ (maximum length: 15 characters) 357 358[[link_attr_ifindex]] 359==== Interface Index (Identifier) 360The interface index is an integer uniquely identifying a link. If present 361in any link message, it will be used to identify an existing link. 362 363[source,c] 364----- 365#include <netlink/route/link.h> 366 367void rtnl_link_set_ifindex(struct rtnl_link *link, int ifindex); 368int rtnl_link_get_ifindex(struct rtnl_link *link); 369----- 370 371[[link_attr_group]] 372==== Group 373Each link can be assigned a numeric group identifier to group a bunch of links 374together and apply a set of changes to a group instead of just a single link. 375 376 377[source,c] 378----- 379#include <netlink/route/link.h> 380 381void rtnl_link_set_group(struct rtnl_link *link, uint32_t group); 382uint32_t rtnl_link_get_group(struct rtnl_link *link); 383----- 384 385[[link_attr_address]] 386==== Link Layer Address 387The link layer address (e.g. MAC address). 388 389[source,c] 390----- 391#include <netlink/route/link.h> 392 393void rtnl_link_set_addr(struct rtnl_link *link, struct nl_addr *addr); 394struct nl_addr *rtnl_link_get_addr(struct rtnl_link *link); 395----- 396 397[[link_attr_broadcast]] 398==== Broadcast Address 399The link layer broadcast address 400 401[source,c] 402----- 403#include <netlink/route/link.h> 404 405void rtnl_link_set_broadcast(struct rtnl_link *link, struct nl_addr *addr); 406struct nl_addr *rtnl_link_get_broadcast(struct rtnl_link *link); 407----- 408 409[[link_attr_mtu]] 410==== MTU (Maximum Transmission Unit) 411The maximum transmission unit specifies the maximum packet size a network 412device can transmit or receive. This value may be lower than the capability 413of the physical network device. 414 415[source,c] 416----- 417#include <netlink/route/link.h> 418 419void rtnl_link_set_mtu(struct rtnl_link *link, unsigned int mtu); 420unsigned int rtnl_link_get_mtu(struct rtnl_link *link); 421----- 422 423[[link_attr_flags]] 424==== Flags 425The flags of a link enable or disable various link features or inform about 426the state of the link. 427 428[source,c] 429----- 430#include <netlink/route/link.h> 431 432void rtnl_link_set_flags(struct rtnl_link *link, unsigned int flags); 433void rtnl_link_unset_flags(struct rtnl_link *link, unsigned int flags); 434unsigned int rtnl_link_get_flags(struct rtnl_link *link); 435----- 436 437[options="compact"] 438[horizontal] 439IFF_UP:: Link is up (administratively) 440IFF_RUNNING:: Link is up and carrier is OK (RFC2863 OPER_UP) 441IFF_LOWER_UP:: Link layer is operational 442IFF_DORMANT:: Driver signals dormant 443IFF_BROADCAST:: Link supports broadcasting 444IFF_MULTICAST:: Link supports multicasting 445IFF_ALLMULTI:: Link supports multicast routing 446IFF_DEBUG:: Tell driver to do debugging (currently unused) 447IFF_LOOPBACK:: Link loopback network 448IFF_POINTOPOINT:: Point-to-point link 449IFF_NOARP:: ARP is not supported 450IFF_PROMISC:: Status of promiscious mode 451IFF_MASTER:: Master of a load balancer (bonding) 452IFF_SLAVE:: Slave to a master link 453IFF_PORTSEL:: Driver supports setting media type (only used by ARM ethernet) 454IFF_AUTOMEDIA:: Link selects port automatically (only used by ARM ethernet) 455IFF_ECHO:: Echo sent packets (testing feature, CAN only) 456IFF_DYNAMIC:: Unused (BSD compatibility) 457IFF_NOTRAILERS:: Unused (BSD compatibility) 458 459To translate a link flag to a link flag name or vice versa: 460 461[source,c] 462----- 463#include <netlink/route/link.h> 464 465char *rtnl_link_flags2str(int flags, char *buf, size_t size); 466int rtnl_link_str2flags(const char *flag_name); 467----- 468 469[[link_attr_txqlen]] 470==== Transmission Queue Length 471 472The transmission queue holds packets before packets are delivered to 473the driver for transmission. It is usually specified in number of 474packets but the unit may be specific to the link type. 475 476[source,c] 477----- 478#include <netlink/route/link.h> 479 480void rtnl_link_set_txqlen(struct rtnl_link *link, unsigned int txqlen); 481unsigned int rtnl_link_get_txqlen(struct rtnl_link *link); 482----- 483 484[[link_attr_operstate]] 485==== Operational Status 486The operational status has been introduced to provide extended information 487on the link status. Traditionally the link state has been described using 488the link flags +IFF_UP, IFF_RUNNING, IFF_LOWER_UP+, and +IFF_DORMANT+ which 489was no longer sufficient for some link types. 490 491[source,c] 492----- 493#include <netlink/route/link.h> 494 495void rtnl_link_set_operstate(struct rtnl_link *link, uint8_t state); 496uint8_t rtnl_link_get_operstate(struct rtnl_link *link); 497----- 498 499[options="compact"] 500[horizontal] 501IF_OPER_UNKNOWN:: Unknown state 502IF_OPER_NOTPRESENT:: Link not present 503IF_OPER_DOWN:: Link down 504IF_OPER_LOWERLAYERDOWN:: L1 down 505IF_OPER_TESTING:: Testing 506IF_OPER_DORMANT:: Dormant 507IF_OPER_UP:: Link up 508 509Translation of operational status code to string and vice versa: 510 511[source,c] 512----- 513#include <netlink/route/link.h> 514 515char *rtnl_link_operstate2str(uint8_t state, char *buf, size_t size); 516int rtnl_link_str2operstate(const char *name); 517----- 518 519[[link_attr_mode]] 520==== Mode 521Currently known link modes are: 522 523[options="compact"] 524[horizontal] 525IF_LINK_MODE_DEFAULT:: Default link mode 526IF_LINK_MODE_DORMANT:: Limit upward transition to dormant 527 528[source,c] 529----- 530#include <netlink/route/link.h> 531 532void rtnl_link_set_linkmode(struct rtnl_link *link, uint8_t mode); 533uint8_t rtnl_link_get_linkmode(struct rtnl_link *link); 534----- 535 536Translation of link mode to string and vice versa: 537 538[source,c] 539----- 540char *rtnl_link_mode2str(uint8_t mode, char *buf, size_t len); 541uint8_t rtnl_link_str2mode(const char *name); 542----- 543 544[[link_attr_alias]] 545==== IfAlias 546Alternative name for the link, primarly used for SNMP IfAlias. 547 548[source,c] 549----- 550#include <netlink/route/link.h> 551 552const char *rtnl_link_get_ifalias(struct rtnl_link *link); 553void rtnl_link_set_ifalias(struct rtnl_link *link, const char *alias); 554----- 555 556*Length limit:* 256 557 558[[link_attr_arptype]] 559==== Hardware Type 560 561[source,c] 562----- 563#include <netlink/route/link.h> 564#include <linux/if_arp.h> 565 566void rtnl_link_set_arptype(struct rtnl_link *link, unsigned int arptype); 567unsigned int rtnl_link_get_arptype(struct rtnl_link *link); 568---- 569 570Translation of hardware type to character string and vice versa: 571 572[source,c] 573----- 574#include <netlink/utils.h> 575 576char *nl_llproto2str(int arptype, char *buf, size_t len); 577int nl_str2llproto(const char *name); 578----- 579 580[[link_attr_qdisc]] 581==== Qdisc 582The name of the queueing discipline used by the link is of informational 583nature only. It is a read-only attribute provided by the kernel and cannot 584be modified. The set function is provided solely for the purpose of creating 585link objects to be used for comparison. 586 587For more information on how to modify the qdisc of a link, see section 588<<route_tc>>. 589 590[source,c] 591----- 592#include <netlink/route/link.h> 593 594void rtnl_link_set_qdisc(struct rtnl_link *link, const char *name); 595char *rtnl_link_get_qdisc(struct rtnl_link *link); 596----- 597 598[[link_attr_promiscuity]] 599==== Promiscuity 600The number of subsystem currently depending on the link being promiscuous mode. 601A value of 0 indicates that the link is not in promiscuous mode. It is a 602read-only attribute provided by the kernel and cannot be modified. The set 603function is provided solely for the purpose of creating link objects to be 604used for comparison. 605 606[source,c] 607----- 608#include <netlink/route/link.h> 609 610void rtnl_link_set_promiscuity(struct rtnl_link *link, uint32_t count); 611uint32_t rtnl_link_get_promiscuity(struct rtnl_link *link); 612----- 613 614[[link_num_rxtx_queues]] 615==== RX/TX Queues 616The number of RX/TX queues the link provides. The attribute is writable but 617will only be considered when creating a new network device via netlink. 618 619[source,c] 620----- 621#include <netlink/route/link.h> 622 623void rtnl_link_set_num_tx_queues(struct rtnl_link *link, uint32_t nqueues); 624uint32_t rtnl_link_get_num_tx_queues(struct rtnl_link *link); 625 626void rtnl_link_set_num_rx_queues(struct rtnl_link *link, uint32_t nqueues); 627uint32_t rtnl_link_get_num_rx_queues(struct rtnl_link *link); 628----- 629 630[[link_attr_weight]] 631==== Weight 632This attribute is unused and obsoleted in all recent kernels. 633 634 635[[link_modules]] 636=== Modules 637 638[[link_bonding]] 639==== Bonding 640 641.Example: Add bonding link 642[source,c] 643----- 644#include <netlink/route/link.h> 645 646struct rtnl_link *link; 647 648link = rtnl_link_bond_alloc(); 649rtnl_link_set_name(link, "my_bond"); 650 651/* requires admin privileges */ 652if (rtnl_link_add(sk, link, NLM_F_CREATE) < 0) 653 /* error */ 654 655rtnl_link_put(link); 656----- 657 658[[link_vlan]] 659==== VLAN 660 661[source,c] 662----- 663extern char * rtnl_link_vlan_flags2str(int, char *, size_t); 664extern int rtnl_link_vlan_str2flags(const char *); 665 666extern int rtnl_link_vlan_set_id(struct rtnl_link *, int); 667extern int rtnl_link_vlan_get_id(struct rtnl_link *); 668 669extern int rtnl_link_vlan_set_flags(struct rtnl_link *, 670 unsigned int); 671extern int rtnl_link_vlan_unset_flags(struct rtnl_link *, 672 unsigned int); 673extern unsigned int rtnl_link_vlan_get_flags(struct rtnl_link *); 674 675extern int rtnl_link_vlan_set_ingress_map(struct rtnl_link *, 676 int, uint32_t); 677extern uint32_t * rtnl_link_vlan_get_ingress_map(struct rtnl_link *); 678 679extern int rtnl_link_vlan_set_egress_map(struct rtnl_link *, 680 uint32_t, int); 681extern struct vlan_map *rtnl_link_vlan_get_egress_map(struct rtnl_link *, 682 int *); 683----- 684 685.Example: Add a VLAN device 686[source,c] 687----- 688struct rtnl_link *link; 689int master_index; 690 691/* lookup interface index of eth0 */ 692if (!(master_index = rtnl_link_name2i(link_cache, "eth0"))) 693 /* error */ 694 695/* allocate new link object of type vlan */ 696link = rtnl_link_vlan_alloc(); 697 698/* set eth0 to be our master device */ 699rtnl_link_set_link(link, master_index); 700 701rtnl_link_vlan_set_id(link, 10); 702 703if ((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 704 /* error */ 705 706rtnl_link_put(link); 707----- 708 709[[link_macvlan]] 710==== MACVLAN 711 712[source,c] 713----- 714extern struct rtnl_link *rtnl_link_macvlan_alloc(void); 715 716extern int rtnl_link_is_macvlan(struct rtnl_link *); 717 718extern char * rtnl_link_macvlan_mode2str(int, char *, size_t); 719extern int rtnl_link_macvlan_str2mode(const char *); 720 721extern char * rtnl_link_macvlan_flags2str(int, char *, size_t); 722extern int rtnl_link_macvlan_str2flags(const char *); 723 724extern int rtnl_link_macvlan_set_mode(struct rtnl_link *, 725 uint32_t); 726extern uint32_t rtnl_link_macvlan_get_mode(struct rtnl_link *); 727 728extern int rtnl_link_macvlan_set_flags(struct rtnl_link *, 729 uint16_t); 730extern int rtnl_link_macvlan_unset_flags(struct rtnl_link *, 731 uint16_t); 732extern uint16_t rtnl_link_macvlan_get_flags(struct rtnl_link *); 733----- 734 735.Example: Add a MACVLAN device 736[source,c] 737----- 738struct rtnl_link *link; 739int master_index; 740struct nl_addr* addr; 741 742/* lookup interface index of eth0 */ 743if (!(master_index = rtnl_link_name2i(link_cache, "eth0"))) 744 /* error */ 745 746/* allocate new link object of type macvlan */ 747link = rtnl_link_macvlan_alloc(); 748 749/* set eth0 to be our master device */ 750rtnl_link_set_link(link, master_index); 751 752/* set address of virtual interface */ 753addr = nl_addr_build(AF_LLC, ether_aton("00:11:22:33:44:55"), ETH_ALEN); 754rtnl_link_set_addr(link, addr); 755nl_addr_put(addr); 756 757/* set mode of virtual interface */ 758rtnl_link_macvlan_set_mode(link, rtnl_link_macvlan_str2mode("bridge")); 759 760if ((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 761 /* error */ 762 763rtnl_link_put(link); 764----- 765 766[[link_vxlan]] 767==== VXLAN 768 769[source,c] 770----- 771extern struct rtnl_link *rtnl_link_vxlan_alloc(void); 772 773extern int rtnl_link_is_vxlan(struct rtnl_link *); 774 775extern int rtnl_link_vxlan_set_id(struct rtnl_link *, uint32_t); 776extern int rtnl_link_vxlan_get_id(struct rtnl_link *, uint32_t *); 777 778extern int rtnl_link_vxlan_set_group(struct rtnl_link *, struct nl_addr *); 779extern int rtnl_link_vxlan_get_group(struct rtnl_link *, struct nl_addr **); 780 781extern int rtnl_link_vxlan_set_link(struct rtnl_link *, uint32_t); 782extern int rtnl_link_vxlan_get_link(struct rtnl_link *, uint32_t *); 783 784extern int rtnl_link_vxlan_set_local(struct rtnl_link *, struct nl_addr *); 785extern int rtnl_link_vxlan_get_local(struct rtnl_link *, struct nl_addr **); 786 787extern int rtnl_link_vxlan_set_ttl(struct rtnl_link *, uint8_t); 788extern int rtnl_link_vxlan_get_ttl(struct rtnl_link *); 789 790extern int rtnl_link_vxlan_set_tos(struct rtnl_link *, uint8_t); 791extern int rtnl_link_vxlan_get_tos(struct rtnl_link *); 792 793extern int rtnl_link_vxlan_set_learning(struct rtnl_link *, uint8_t); 794extern int rtnl_link_vxlan_get_learning(struct rtnl_link *); 795extern int rtnl_link_vxlan_enable_learning(struct rtnl_link *); 796extern int rtnl_link_vxlan_disable_learning(struct rtnl_link *); 797 798extern int rtnl_link_vxlan_set_ageing(struct rtnl_link *, uint32_t); 799extern int rtnl_link_vxlan_get_ageing(struct rtnl_link *, uint32_t *); 800 801extern int rtnl_link_vxlan_set_limit(struct rtnl_link *, uint32_t); 802extern int rtnl_link_vxlan_get_limit(struct rtnl_link *, uint32_t *); 803 804extern int rtnl_link_vxlan_set_port_range(struct rtnl_link *, 805 struct ifla_vxlan_port_range *); 806extern int rtnl_link_vxlan_get_port_range(struct rtnl_link *, 807 struct ifla_vxlan_port_range *); 808 809extern int rtnl_link_vxlan_set_proxy(struct rtnl_link *, uint8_t); 810extern int rtnl_link_vxlan_get_proxy(struct rtnl_link *); 811extern int rtnl_link_vxlan_enable_proxy(struct rtnl_link *); 812extern int rtnl_link_vxlan_disable_proxy(struct rtnl_link *); 813 814extern int rtnl_link_vxlan_set_rsc(struct rtnl_link *, uint8_t); 815extern int rtnl_link_vxlan_get_rsc(struct rtnl_link *); 816extern int rtnl_link_vxlan_enable_rsc(struct rtnl_link *); 817extern int rtnl_link_vxlan_disable_rsc(struct rtnl_link *); 818 819extern int rtnl_link_vxlan_set_l2miss(struct rtnl_link *, uint8_t); 820extern int rtnl_link_vxlan_get_l2miss(struct rtnl_link *); 821extern int rtnl_link_vxlan_enable_l2miss(struct rtnl_link *); 822extern int rtnl_link_vxlan_disable_l2miss(struct rtnl_link *); 823 824extern int rtnl_link_vxlan_set_l3miss(struct rtnl_link *, uint8_t); 825extern int rtnl_link_vxlan_get_l3miss(struct rtnl_link *); 826extern int rtnl_link_vxlan_enable_l3miss(struct rtnl_link *); 827extern int rtnl_link_vxlan_disable_l3miss(struct rtnl_link *); 828----- 829 830.Example: Add a VXLAN device 831[source,c] 832----- 833struct rtnl_link *link; 834struct nl_addr* addr; 835 836/* allocate new link object of type vxlan */ 837link = rtnl_link_vxlan_alloc(); 838 839/* set interface name */ 840rtnl_link_set_name(link, "vxlan128"); 841 842/* set VXLAN network identifier */ 843if ((err = rtnl_link_vxlan_set_id(link, 128)) < 0) 844 /* error */ 845 846/* set multicast address to join */ 847if ((err = nl_addr_parse("239.0.0.1", AF_INET, &addr)) < 0) 848 /* error */ 849 850if ((err = rtnl_link_set_group(link, addr)) < 0) 851 /* error */ 852 853nl_addr_put(addr); 854 855if ((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 856 /* error */ 857 858rtnl_link_put(link); 859----- 860 861[[link_ipip]] 862==== IPIP 863 864[source,c] 865----- 866extern struct rtnl_link *rtnl_link_ipip_alloc(void); 867extern int rtnl_link_ipip_add(struct nl_sock *sk, const char *name); 868 869extern int rtnl_link_ipip_set_link(struct rtnl_link *link, uint32_t index); 870extern uint32_t rtnl_link_ipip_get_link(struct rtnl_link *link); 871 872extern int rtnl_link_ipip_set_local(struct rtnl_link *link, uint32_t addr); 873extern uint32_t rtnl_link_ipip_get_local(struct rtnl_link *link); 874 875extern int rtnl_link_ipip_set_remote(struct rtnl_link *link, uint32_t addr); 876extern uint32_t rtnl_link_ipip_get_remote(struct rtnl_link *link); 877 878extern int rtnl_link_ipip_set_ttl(struct rtnl_link *link, uint8_t ttl); 879extern uint8_t rtnl_link_ipip_get_ttl(struct rtnl_link *link); 880 881extern int rtnl_link_ipip_set_tos(struct rtnl_link *link, uint8_t tos); 882extern uint8_t rtnl_link_ipip_get_tos(struct rtnl_link *link); 883 884extern int rtnl_link_ipip_set_pmtudisc(struct rtnl_link *link, uint8_t pmtudisc); 885extern uint8_t rtnl_link_ipip_get_pmtudisc(struct rtnl_link *link); 886 887----- 888 889.Example: Add a ipip tunnel device 890[source,c] 891----- 892struct rtnl_link *link 893struct in_addr addr 894 895/* allocate new link object of type vxlan */ 896if(!(link = rtnl_link_ipip_alloc())) 897 /* error */ 898 899/* set ipip tunnel name */ 900if ((err = rtnl_link_set_name(link, "ipip-tun")) < 0) 901 /* error */ 902 903/* set link index */ 904if ((err = rtnl_link_ipip_set_link(link, if_index)) < 0) 905 /* error */ 906 907/* set local address */ 908inet_pton(AF_INET, "192.168.254.12", &addr.s_addr); 909if ((err = rtnl_link_ipip_set_local(link, addr.s_addr)) < 0) 910 /* error */ 911 912/* set remote address */ 913inet_pton(AF_INET, "192.168.254.13", &addr.s_addr 914if ((err = rtnl_link_ipip_set_remote(link, addr.s_addr)) < 0) 915 /* error */ 916 917/* set tunnel ttl */ 918if ((err = rtnl_link_ipip_set_ttl(link, 64)) < 0) 919 /* error */ 920 921if((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 922 /* error */ 923 924rtnl_link_put(link); 925----- 926 927[[link_ipgre]] 928==== IPGRE 929 930[source,c] 931----- 932extern struct rtnl_link *rtnl_link_ipgre_alloc(void); 933extern int rtnl_link_ipgre_add(struct nl_sock *sk, const char *name); 934 935extern int rtnl_link_ipgre_set_link(struct rtnl_link *link, uint32_t index); 936extern uint32_t rtnl_link_ipgre_get_link(struct rtnl_link *link); 937 938extern int rtnl_link_ipgre_set_iflags(struct rtnl_link *link, uint16_t iflags); 939extern uint16_t rtnl_link_get_iflags(struct rtnl_link *link); 940 941extern int rtnl_link_ipgre_set_oflags(struct rtnl_link *link, uint16_t oflags); 942extern uint16_t rtnl_link_get_oflags(struct rtnl_link *link); 943 944extern int rtnl_link_ipgre_set_ikey(struct rtnl_link *link, uint32_t ikey); 945extern uint32_t rtnl_link_get_ikey(struct rtnl_link *link); 946 947extern int rtnl_link_ipgre_set_okey(struct rtnl_link *link, uint32_t okey); 948extern uint32_t rtnl_link_get_okey(struct rtnl_link *link) 949 950extern int rtnl_link_ipgre_set_local(struct rtnl_link *link, uint32_t addr); 951extern uint32_t rtnl_link_ipgre_get_local(struct rtnl_link *link); 952 953extern int rtnl_link_ipgre_set_remote(struct rtnl_link *link, uint32_t addr); 954extern uint32_t rtnl_link_ipgre_get_remote(struct rtnl_link *link); 955 956extern int rtnl_link_ipgre_set_ttl(struct rtnl_link *link, uint8_t ttl); 957extern uint8_t rtnl_link_ipgre_get_ttl(struct rtnl_link *link); 958 959extern int rtnl_link_ipgre_set_tos(struct rtnl_link *link, uint8_t tos); 960extern uint8_t rtnl_link_ipgre_get_tos(struct rtnl_link *link); 961 962extern int rtnl_link_ipgre_set_pmtudisc(struct rtnl_link *link, uint8_t pmtudisc); 963extern uint8_t rtnl_link_ipgre_get_pmtudisc(struct rtnl_link *link); 964 965----- 966 967.Example: Add a ipgre tunnel device 968[source,c] 969----- 970struct rtnl_link *link 971struct in_addr addr 972 973/* allocate new link object of type vxlan */ 974if(!(link = rtnl_link_ipgre_alloc())) 975 /* error */ 976 977/* set ipgre tunnel name */ 978if ((err = rtnl_link_set_name(link, "ipgre-tun")) < 0) 979 /* error */ 980 981/* set link index */ 982if ((err = rtnl_link_ipgre_set_link(link, if_index)) < 0) 983 /* error */ 984 985/* set local address */ 986inet_pton(AF_INET, "192.168.254.12", &addr.s_addr); 987if ((err = rtnl_link_ipgre_set_local(link, addr.s_addr)) < 0) 988 /* error */ 989 990/* set remote address */ 991inet_pton(AF_INET, "192.168.254.13", &addr.s_addr 992if ((err = rtnl_link_ipgre_set_remote(link, addr.s_addr)) < 0) 993 /* error */ 994 995/* set tunnel ttl */ 996if ((err = rtnl_link_ipgre_set_ttl(link, 64)) < 0) 997 /* error */ 998 999if((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 1000 /* error */ 1001 1002rtnl_link_put(link); 1003----- 1004 1005[[link_sit]] 1006==== SIT 1007 1008[source,c] 1009----- 1010extern struct rtnl_link *rtnl_link_sit_alloc(void); 1011extern int rtnl_link_sit_add(struct nl_sock *sk, const char *name); 1012 1013extern int rtnl_link_sit_set_link(struct rtnl_link *link, uint32_t index); 1014extern uint32_t rtnl_link_sit_get_link(struct rtnl_link *link); 1015 1016extern int rtnl_link_sit_set_iflags(struct rtnl_link *link, uint16_t iflags); 1017extern uint16_t rtnl_link_get_iflags(struct rtnl_link *link); 1018 1019extern int rtnl_link_sit_set_oflags(struct rtnl_link *link, uint16_t oflags); 1020extern uint16_t rtnl_link_get_oflags(struct rtnl_link *link); 1021 1022extern int rtnl_link_sit_set_ikey(struct rtnl_link *link, uint32_t ikey); 1023extern uint32_t rtnl_link_get_ikey(struct rtnl_link *link); 1024 1025extern int rtnl_link_sit_set_okey(struct rtnl_link *link, uint32_t okey); 1026extern uint32_t rtnl_link_get_okey(struct rtnl_link *link) 1027 1028extern int rtnl_link_sit_set_local(struct rtnl_link *link, uint32_t addr); 1029extern uint32_t rtnl_link_sit_get_local(struct rtnl_link *link); 1030 1031extern int rtnl_link_sit_set_remote(struct rtnl_link *link, uint32_t addr); 1032extern uint32_t rtnl_link_sit_get_remote(struct rtnl_link *link); 1033 1034extern int rtnl_link_sit_set_ttl(struct rtnl_link *link, uint8_t ttl); 1035extern uint8_t rtnl_link_sit_get_ttl(struct rtnl_link *link); 1036 1037extern int rtnl_link_sit_set_tos(struct rtnl_link *link, uint8_t tos); 1038extern uint8_t rtnl_link_sit_get_tos(struct rtnl_link *link); 1039 1040extern int rtnl_link_sit_set_pmtudisc(struct rtnl_link *link, uint8_t pmtudisc); 1041extern uint8_t rtnl_link_sit_get_pmtudisc(struct rtnl_link *link); 1042 1043----- 1044 1045.Example: Add a sit tunnel device 1046[source,c] 1047----- 1048struct rtnl_link *link 1049struct in_addr addr 1050 1051/* allocate new link object of type vxlan */ 1052if(!(link = rtnl_link_sit_alloc())) 1053 /* error */ 1054 1055/* set sit tunnel name */ 1056if ((err = rtnl_link_set_name(link, "sit-tun")) < 0) 1057 /* error */ 1058 1059/* set link index */ 1060if ((err = rtnl_link_sit_set_link(link, if_index)) < 0) 1061 /* error */ 1062 1063/* set local address */ 1064inet_pton(AF_INET, "192.168.254.12", &addr.s_addr); 1065if ((err = rtnl_link_sit_set_local(link, addr.s_addr)) < 0) 1066 /* error */ 1067 1068/* set remote address */ 1069inet_pton(AF_INET, "192.168.254.13", &addr.s_addr 1070if ((err = rtnl_link_sit_set_remote(link, addr.s_addr)) < 0) 1071 /* error */ 1072 1073/* set tunnel ttl */ 1074if ((err = rtnl_link_sit_set_ttl(link, 64)) < 0) 1075 /* error */ 1076 1077if((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 1078 /* error */ 1079 1080rtnl_link_put(link); 1081----- 1082 1083 1084[[link_ipvti]] 1085==== IPVTI 1086 1087[source,c] 1088----- 1089extern struct rtnl_link *rtnl_link_ipvti_alloc(void); 1090extern int rtnl_link_ipvti_add(struct nl_sock *sk, const char *name); 1091 1092extern int rtnl_link_ipvti_set_link(struct rtnl_link *link, uint32_t index); 1093extern uint32_t rtnl_link_ipvti_get_link(struct rtnl_link *link); 1094 1095extern int rtnl_link_ipvti_set_ikey(struct rtnl_link *link, uint32_t ikey); 1096extern uint32_t rtnl_link_get_ikey(struct rtnl_link *link); 1097 1098extern int rtnl_link_ipvti_set_okey(struct rtnl_link *link, uint32_t okey); 1099extern uint32_t rtnl_link_get_okey(struct rtnl_link *link) 1100 1101extern int rtnl_link_ipvti_set_local(struct rtnl_link *link, uint32_t addr); 1102extern uint32_t rtnl_link_ipvti_get_local(struct rtnl_link *link); 1103 1104extern int rtnl_link_ipvti_set_remote(struct rtnl_link *link, uint32_t addr); 1105extern uint32_t rtnl_link_ipvti_get_remote(struct rtnl_link *link); 1106 1107----- 1108 1109.Example: Add a ipvti tunnel device 1110[source,c] 1111----- 1112struct rtnl_link *link 1113struct in_addr addr 1114 1115/* allocate new link object of type vxlan */ 1116if(!(link = rtnl_link_ipvti_alloc())) 1117 /* error */ 1118 1119/* set ipvti tunnel name */ 1120if ((err = rtnl_link_set_name(link, "ipvti-tun")) < 0) 1121 /* error */ 1122 1123/* set link index */ 1124if ((err = rtnl_link_ipvti_set_link(link, if_index)) < 0) 1125 /* error */ 1126 1127/* set local address */ 1128inet_pton(AF_INET, "192.168.254.12", &addr.s_addr); 1129if ((err = rtnl_link_ipvti_set_local(link, addr.s_addr)) < 0) 1130 /* error */ 1131 1132/* set remote address */ 1133inet_pton(AF_INET, "192.168.254.13", &addr.s_addr 1134if ((err = rtnl_link_ipvti_set_remote(link, addr.s_addr)) < 0) 1135 /* error */ 1136 1137if((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0) 1138 /* error */ 1139 1140rtnl_link_put(link); 1141----- 1142 1143[[link_ip6tnl]] 1144==== IP6TNL 1145 1146[source,c] 1147----- 1148extern struct rtnl_link *rtnl_link_ip6_tnl_alloc(void); 1149extern int rtnl_link_ip6_tnl_add(struct nl_sock *sk, const char *name); 1150 1151extern int rtnl_link_ip6_tnl_set_link(struct rtnl_link *link, uint32_t index); 1152extern uint32_t rtnl_link_ip6_tnl_get_link(struct rtnl_link *link); 1153 1154extern int rtnl_link_ip6_tnl_set_local(struct rtnl_link *link, struct in6_addr *); 1155extern int rtnl_link_ip6_tnl_get_local(struct rtnl_link *link, struct in6_addr *); 1156 1157extern int rtnl_link_ip6_tnl_set_remote(struct rtnl_link *link, struct in6_addr *); 1158extern int rtnl_link_ip6_tnl_get_remote(struct rtnl_link *link, struct in6_addr *); 1159 1160extern int rtnl_link_ip6_tnl_set_ttl(struct rtnl_link *link, uint8_t ttl); 1161extern uint8_t rtnl_link_ip6_tnl_get_ttl(struct rtnl_link *link); 1162 1163extern int rtnl_link_ip6_tnl_set_tos(struct rtnl_link *link, uint8_t tos); 1164extern uint8_t rtnl_link_ip6_tnl_get_tos(struct rtnl_link *link); 1165 1166extern int rtnl_link_ip6_tnl_set_encaplimit(struct rtnl_link *link, uint8_t encap_limit); 1167extern uint8_t rtnl_link_ip6_tnl_get_encaplimit(struct rtnl_link *link); 1168 1169extern int rtnl_link_ip6_tnl_set_flags(struct rtnl_link *link, uint32_t flags); 1170extern uint32_t rtnl_link_ip6_tnl_get_flags(struct rtnl_link *link); 1171 1172extern uint32_t rtnl_link_ip6_tnl_get_flowinfo(struct rtnl_link *link); 1173extern int rtnl_link_ip6_tnl_set_flowinfo(struct rtnl_link *link, uint32_t flowinfo); 1174 1175extern int rtnl_link_ip6_tnl_set_proto(struct rtnl_link *link, uint8_t proto); 1176extern uint8_t rtnl_link_ip6_tnl_get_proto(struct rtnl_link *link); 1177 1178----- 1179 1180.Example: Add a ip6tnl tunnel device 1181[source,c] 1182----- 1183struct rtnl_link *link 1184struct in6_addr addr 1185 1186link = rtnl_link_ip6_tnl_alloc(); 1187 1188rtnl_link_set_name(link, "ip6tnl-tun"); 1189rtnl_link_ip6_tnl_set_link(link, if_index); 1190 1191inet_pton(AF_INET6, "2607:f0d0:1002:51::4", &addr); 1192rtnl_link_ip6_tnl_set_local(link, &addr); 1193 1194inet_pton(AF_INET6, "2607:f0d0:1002:52::5", &addr); 1195rtnl_link_ip6_tnl_set_remote(link, &addr); 1196 1197rtnl_link_add(sk, link, NLM_F_CREATE); 1198rtnl_link_put(link); 1199 1200----- 1201 1202 1203== Neighbouring 1204 1205== Routing 1206 1207[[route_tc]] 1208== Traffic Control 1209 1210The traffic control architecture allows the queueing and 1211prioritization of packets before they are enqueued to the network 1212driver. To a limited degree it is also possible to take control of 1213network traffic as it enters the network stack. 1214 1215The architecture consists of three different types of modules: 1216 1217- *Queueing disciplines (qdisc)* provide a mechanism to enqueue packets 1218 in different forms. They may be used to implement fair queueing, 1219 prioritization of differentiated services, enforce bandwidth 1220 limitations, or even to simulate network behaviour such as packet 1221 loss and packet delay. Qdiscs can be classful in which case they 1222 allow traffic classes described in the next paragraph to be attached 1223 to them. 1224 1225- *Traffic classes (class)* are supported by several qdiscs to build 1226 a tree structure for different types of traffic. Each class may be 1227 assigned its own set of attributes such as bandwidth limits or 1228 queueing priorities. Some qdiscs even allow borrowing of bandwidth 1229 between classes. 1230 1231- *Classifiers (cls)* are used to decide which qdisc/class the packet 1232 should be enqueued to. Different types of classifiers exists, 1233 ranging from classification based on protocol header values to 1234 classification based on packet priority or firewall marks. 1235 Additionally most classifiers support *extended matches (ematch)* 1236 which allow extending classifiers by a set of matcher modules, and 1237 *actions* which allow classifiers to take actions such as mangling, 1238 mirroring, or even rerouting of packets. 1239 1240.Default Qdisc 1241 1242The default qdisc used on all network devices is `pfifo_fast`. 1243Network devices which do not require a transmit queue such as the 1244loopback device do not have a default qdisc attached. The `pfifo_fast` 1245qdisc provides three bands to prioritize interactive traffic over bulk 1246traffic. Classification is based on the packet priority (diffserv). 1247 1248image:qdisc_default.png["Default Qdisc"] 1249 1250.Multiqueue Default Qdisc 1251 1252If the network device provides multiple transmit queues the `mq` 1253qdisc is used by default. It will automatically create a separate 1254class for each transmit queue available and will also replace 1255the single per device tx lock with a per queue lock. 1256 1257image:qdisc_mq.png["Multiqueue default Qdisc"] 1258 1259.Example of a customized classful qdisc setup 1260 1261The following figure illustrates a possible combination of different 1262queueing and classification modules to implement quality of service 1263needs. 1264 1265image:tc_overview.png["Classful Qdisc diagram"] 1266 1267=== Traffic Control Object 1268 1269Each type traffic control module (qdisc, class, classifier) is 1270represented by its own structure. All of them are based on the traffic 1271control object represented by `struct rtnl_tc` which itself is based 1272on the generic object `struct nl_object` to make it cacheable. The 1273traffic control object contains all attributes, implementation details 1274and statistics that are shared by all of the traffic control object 1275types. 1276 1277image:tc_obj.png["struct rtnl_tc hierarchy"] 1278 1279It is not possible to allocate a `struct rtnl_tc` object, instead the 1280actual tc object types must be allocated directly using 1281`rtnl_qdisc_alloc()`, `rtnl_class_alloc()`, `rtnl_cls_alloc()` and 1282then casted to `struct rtnl_tc` using the `TC_CAST()` macro. 1283 1284.Usage Example: Allocation, Casting, Freeing 1285[source,c] 1286----- 1287#include <netlink/route/tc.h> 1288#include <netlink/route/qdisc.h> 1289 1290struct rtnl_qdisc *qdisc; 1291 1292/* Allocation of a qdisc object */ 1293qdisc = rtnl_qdisc_alloc(); 1294 1295/* Cast the qdisc to a tc object using TC_CAST() to use rtnl_tc_ functions. */ 1296rtnl_tc_set_mpu(TC_CAST(qdisc), 64); 1297 1298/* Free the qdisc object */ 1299rtnl_qdisc_put(qdisc); 1300----- 1301 1302[[tc_attr]] 1303==== Attributes 1304 1305Handle:: 1306The handle uniquely identifies a tc object and is used to refer 1307to other tc objects when constructing tc trees. 1308+ 1309[source,c] 1310----- 1311void rtnl_tc_set_handle(struct rtnl_tc *tc, uint32_t handle); 1312uint32_t rtnl_tc_get_handle(struct rtnl_tc *tc); 1313----- 1314 1315Interface Index:: 1316The interface index specifies the network device the traffic object 1317is attached to. The function `rtnl_tc_set_link()` should be preferred 1318when setting the interface index. It stores the reference to the link 1319object in the tc object and allows retrieving the `mtu` and `linktype` 1320automatically. 1321+ 1322[source,c] 1323----- 1324void rtnl_tc_set_ifindex(struct rtnl_tc *tc, int ifindex); 1325void rtnl_tc_set_link(struct rtnl_tc *tc, struct rtnl_link *link); 1326int rtnl_tc_get_ifindex(struct rtnl_tc *tc); 1327----- 1328 1329Link Type:: 1330The link type specifies the kind of link that is used by the network 1331device (e.g. ethernet, ATM, ...). It is derived automatically when 1332the network device is specified with `rtnl_tc_set_link()`. 1333The default fallback is `ARPHRD_ETHER` (ethernet). 1334+ 1335[source,c] 1336----- 1337void rtnl_tc_set_linktype(struct rtnl_tc *tc, uint32_t type); 1338uint32_t rtnl_tc_get_linktype(struct rtnl_tc *tc); 1339----- 1340 1341Kind:: 1342The kind character string specifies the type of qdisc, class, 1343classifier. Setting the kind results in the module specific 1344structure being allocated. Therefore it is imperative to call 1345`rtnl_tc_set_kind()` before using any type specific API functions 1346such as `rtnl_htb_set_rate()`. 1347+ 1348[source,c] 1349----- 1350int rtnl_tc_set_kind(struct rtnl_tc *tc, const char *kind); 1351char *rtnl_tc_get_kind(struct rtnl_tc *tc); 1352----- 1353 1354MPU:: 1355The Minimum Packet Unit specifies the minimum packet size which will 1356be transmitted 1357ever be seen by this traffic control object. This value is used for 1358rate calculations. Not all object implementations will make use of 1359this value. The default value is 0. 1360+ 1361[source,c] 1362----- 1363void rtnl_tc_set_mpu(struct rtnl_tc *tc, uint32_t mpu); 1364uint32_t rtnl_tc_get_mpu(struct rtnl_tc *tc); 1365----- 1366 1367MTU:: 1368The Maximum Transmission Unit specifies the maximum packet size which 1369will be transmitted. The value is derived from the link specified 1370with `rtnl_tc_set_link()` if not overwritten with `rtnl_tc_set_mtu()`. 1371If no link and MTU is specified, the value defaults to 1500 1372(ethernet). 1373+ 1374[source,c] 1375----- 1376void rtnl_tc_set_mtu(struct rtnl_tc *tc, uint32_t mtu); 1377uint32_t rtnl_tc_get_mtu(struct rtnl_tc *tc); 1378----- 1379 1380Overhead:: 1381The overhead specifies the additional overhead per packet caused by 1382the network layer. This value can be used to correct packet size 1383calculations if the packet size on the wire does not match the packet 1384size seen by the kernel. The default value is 0. 1385+ 1386[source,c] 1387----- 1388void rtnl_tc_set_overhead(struct rtnl_tc *tc, uint32_t overhead); 1389uint32_t rtnl_tc_get_overhead(struct rtnl_tc *tc); 1390----- 1391 1392Parent:: 1393Specifies the parent traffic control object. The parent is identifier 1394by its handle. Special values are: 1395- `TC_H_ROOT`: attach tc object directly to network device (root 1396 qdisc, root classifier) 1397- `TC_H_INGRESS`: same as `TC_H_ROOT` but on the ingress side of the 1398 network stack. 1399+ 1400[source,c] 1401----- 1402void rtnl_tc_set_parent(struct rtnl_tc *tc, uint32_t parent); 1403uint32_t rtnl_tc_get_parent(struct rtnl_tc *tc); 1404----- 1405 1406Statistics:: 1407Generic statistics, see <<tc_stats>> for additional information. 1408+ 1409[source,c] 1410----- 1411uint64_t rtnl_tc_get_stat(struct rtnl_tc *tc, enum rtnl_tc_stat id); 1412----- 1413 1414[[tc_stats]] 1415==== Accessing Statistics 1416 1417The traffic control object holds a set of generic statistics. Not all 1418traffic control modules will make use of all of these statistics. Some 1419modules may provide additional statistics via their own APIs. 1420 1421.Statistic identifiers `(enum rtnl_tc_stat)` 1422[cols="m,,", options="header", frame="topbot"] 1423|==================================================================== 1424| ID | Type | Description 1425| RTNL_TC_PACKETS | Counter | Total # of packets transmitted 1426| RTNL_TC_BYTES | Counter | Total # of bytes transmitted 1427| RTNL_TC_RATE_BPS | Rate | Current bytes/s rate 1428| RTNL_TC_RATE_PPS | Rate | Current packets/s rate 1429| RTNL_TC_QLEN | Rate | Current length of the queue 1430| RTNL_TC_BACKLOG | Rate | # of packets currently backloged 1431| RTNL_TC_DROPS | Counter | # of packets dropped 1432| RTNL_TC_REQUEUES | Counter | # of packets requeued 1433| RTNL_TC_OVERLIMITS | Counter | # of packets that exceeded the limit 1434|==================================================================== 1435 1436NOTE: `RTNL_TC_RATE_BPS` and `RTNL_TC_RATE_PPS` only return meaningful 1437 values if a rate estimator has been configured. 1438 1439.Usage Example: Retrieving tc statistics 1440[source,c] 1441------- 1442#include <netlink/route/tc.h> 1443 1444uint64_t drops, qlen; 1445 1446drops = rtnl_tc_get_stat(TC_CAST(qdisc), RTNL_TC_DROPS); 1447qlen = rtnl_tc_get_stat(TC_CAST(qdisc), RTNL_TC_QLEN); 1448------- 1449 1450==== Rate Table Calculations 1451 1452[[tc_qdisc]] 1453=== Queueing Discipline (qdisc) 1454 1455.Classless Qdisc 1456 1457The queueing discipline (qdisc) is used to implement fair queueing, 1458priorization or rate control. It provides a _enqueue()_ and 1459_dequeue()_ operation. Whenever a network packet leaves the networking 1460stack over a network device, be it a physical or virtual device, it 1461will be enqueued to a qdisc unless the device is queueless. The 1462_enqueue()_ operation is followed by an immediate call to _dequeue()_ 1463for the same qdisc to eventually retrieve a packet which can be 1464scheduled for transmission by the driver. Additionally, the networking 1465stack runs a watchdog which polls the qdisc regularly to dequeue and 1466send packets even if no new packets are being enqueued. 1467 1468This additional watchdog is required due to the fact that qdiscs may 1469hold on to packets and not return any packets upon _dequeue()_ in 1470order to enforce bandwidth restrictions. 1471 1472image:classless_qdisc_nbands.png[alt="Multiband Qdisc", float="right"] 1473 1474The figure illustrates a trivial example of a classless qdisc 1475consisting of three bands (queues). Use of multiple bands is a common 1476technique in qdiscs to implement fair queueing between flows or 1477prioritize differentiated services. 1478 1479Classless qdiscs can be regarded as a blackbox, their inner workings 1480can only be steered using the configuration parameters provided by the 1481qdisc. There is no way of taking influence on the structure of its 1482internal queues itself. 1483 1484.Classful Qdisc 1485 1486Classful qdiscs allow for the queueing structure and classification 1487process to be created by the user. 1488 1489image:classful_qdisc.png["Classful Qdisc"] 1490 1491The figure above shows a classful qdisc with a classifier attached to 1492it which will make the decision whether to enqueue a packet to traffic 1493class +1:1+ or +1:2+. Unlike with classless qdiscs, classful qdiscs 1494allow the classification process and the structure of the queues to be 1495defined by the user. This allows for complex traffic class rules to 1496be applied. 1497 1498.List of Qdisc Implementations 1499[options="header", frame="topbot", cols="2,1^,8"] 1500|====================================================================== 1501| Qdisc | Classful | Description 1502| ATM | Yes | FIXME 1503| Blackhole | No | This qdisc will drop all packets passed to it. 1504| CBQ | Yes | 1505The CBQ (Class Based Queueing) is a classful qdisc which allows 1506creating traffic classes and enforce bandwidth limitations for each 1507class. 1508| DRR | Yes | 1509The DRR (Deficit Round Robin) scheduler is a classful qdisc 1510impelemting fair queueing. Each class is assigned a quantum specyfing 1511the maximum number of bytes that can be served per round. Unused 1512quantum at the end of the round is carried over to the next round. 1513| DSMARK | Yes | FIXME 1514| FIFO | No | FIXME 1515| GRED | No | FIXME 1516| HFSC | Yes | FIXME 1517| HTB | Yes | FIXME 1518| mq | Yes | FIXME 1519| multiq | Yes | FIXME 1520| netem | No | FIXME 1521| Prio | Yes | FIXME 1522| RED | Yes | FIXME 1523| SFQ | Yes | FIXME 1524| TBF | Yes | FIXME 1525| teql | No | FIXME 1526|====================================================================== 1527 1528 1529.QDisc API Overview 1530[cols="a,a", options="header", frame="topbot"] 1531|==================================================================== 1532| Attribute | C Interface 1533| 1534Allocation / Freeing:: 1535| 1536[source,c] 1537----- 1538struct rtnl_qdisc *rtnl_qdisc_alloc(void); 1539void rtnl_qdisc_put(struct rtnl_qdisc *qdisc); 1540----- 1541| 1542Addition:: 1543| 1544[source,c] 1545----- 1546int rtnl_qdisc_build_add_request(struct rtnl_qdisc *qdisc, int flags, 1547 struct nl_msg **result); 1548int rtnl_qdisc_add(struct nl_sock *sock, struct rtnl_qdisc *qdisc, 1549 int flags); 1550----- 1551| 1552Modification:: 1553| 1554[source,c] 1555----- 1556int rtnl_qdisc_build_change_request(struct rtnl_qdisc *old, 1557 struct rtnl_qdisc *new, 1558 struct nl_msg **result); 1559int rtnl_qdisc_change(struct nl_sock *sock, struct rtnl_qdisc *old, 1560 struct rtnl_qdisc *new); 1561----- 1562| 1563Deletion:: 1564| 1565[source,c] 1566----- 1567int rtnl_qdisc_build_delete_request(struct rtnl_qdisc *qdisc, 1568 struct nl_msg **result); 1569int rtnl_qdisc_delete(struct nl_sock *sock, struct rtnl_qdisc *qdisc); 1570----- 1571| 1572Cache:: 1573| 1574[source,c] 1575----- 1576int rtnl_qdisc_alloc_cache(struct nl_sock *sock, 1577 struct nl_cache **cache); 1578struct rtnl_qdisc *rtnl_qdisc_get(struct nl_cache *cache, int, uint32_t); 1579 1580struct rtnl_qdisc *rtnl_qdisc_get_by_parent(struct nl_cache *, int, uint32_t); 1581----- 1582|==================================================================== 1583 1584[[qdisc_get]] 1585==== Retrieving Qdisc Configuration 1586 1587The function rtnl_qdisc_alloc_cache() is used to retrieve the current 1588qdisc configuration in the kernel. It will construct a +RTM_GETQDISC+ 1589netlink message, requesting the complete list of qdiscs configured in 1590the kernel. 1591 1592[source,c] 1593------- 1594#include <netlink/route/qdisc.h> 1595 1596struct nl_cache *all_qdiscs; 1597 1598if (rtnl_link_alloc_cache(sock, &all_qdiscs) < 0) 1599 /* error while retrieving qdisc cfg */ 1600------- 1601 1602The cache can be accessed using the following functions: 1603 1604- Search qdisc with matching ifindex and handle: 1605+ 1606[source,c] 1607-------- 1608struct rtnl_qdisc *rtnl_qdisc_get(struct nl_cache *cache, int ifindex, uint32_t handle); 1609-------- 1610- Search qdisc with matching ifindex and parent: 1611+ 1612[source,c] 1613-------- 1614struct rtnl_qdisc *rtnl_qdisc_get_by_parent(struct nl_cache *cache, int ifindex , uint32_t parent); 1615-------- 1616- Or any of the generic cache functions (e.g. nl_cache_search(), nl_cache_dump(), etc.) 1617 1618.Example: Search and print qdisc 1619[source,c] 1620------- 1621struct rtnl_qdisc *qdisc; 1622int ifindex; 1623 1624ifindex = rtnl_link_get_ifindex(eth0_obj); 1625 1626/* search for qdisc on eth0 with handle 1:0 */ 1627if (!(qdisc = rtnl_qdisc_get(all_qdiscs, ifindex, TC_HANDLE(1, 0)))) 1628 /* no such qdisc found */ 1629 1630nl_object_dump(OBJ_CAST(qdisc), NULL); 1631 1632rtnl_qdisc_put(qdisc); 1633------- 1634 1635[[qdisc_add]] 1636==== Adding a Qdisc 1637 1638In order to add a new qdisc to the kernel, a qdisc object needs to be 1639allocated. It will hold all attributes of the new qdisc. 1640 1641[source,c] 1642----- 1643#include <netlink/route/qdisc.h> 1644 1645struct rtnl_qdisc *qdisc; 1646 1647if (!(qdisc = rtnl_qdisc_alloc())) 1648 /* OOM error */ 1649----- 1650 1651The next step is to specify all generic qdisc attributes using the tc 1652object interface described in the section <<tc_attr>>. 1653 1654The following attributes must be specified: 1655- IfIndex 1656- Parent 1657- Kind 1658 1659[source,c] 1660----- 1661/* Attach qdisc to device eth0 */ 1662rtnl_tc_set_link(TC_CAST(qdisc), eth0_obj); 1663 1664/* Make this the root qdisc */ 1665rtnl_tc_set_parent(TC_CAST(qdisc), TC_H_ROOT); 1666 1667/* Set qdisc identifier to 1:0, if left unspecified, a handle will be generated by the kernel. */ 1668rtnl_tc_set_handle(TC_CAST(qdisc), TC_HANDLE(1, 0)); 1669 1670/* Make this a HTB qdisc */ 1671rtnl_tc_set_kind(TC_CAST(qdisc), "htb"); 1672----- 1673 1674After specyfing the qdisc kind (rtnl_tc_set_kind()) the qdisc type 1675specific interface can be used to set attributes which are specific 1676to the respective qdisc implementations: 1677 1678[source,c] 1679------ 1680/* HTB feature: Make unclassified packets go to traffic class 1:5 */ 1681rtnl_htb_set_defcls(qdisc, TC_HANDLE(1, 5)); 1682------ 1683 1684Finally, the qdisc is ready to be added and can be passed on to the 1685function rntl_qdisc_add() which takes care of constructing a netlink 1686message requesting the addition of the new qdisc, sends the message to 1687the kernel and waits for the response by the kernel. The function 1688returns 0 if the qdisc has been added or updated successfully or a 1689negative error code if an error occured. 1690 1691CAUTION: The kernel operation for updating and adding a qdisc is the 1692 same. Therefore when calling rtnl_qdisc_add() any existing 1693 qdisc with matching handle will be updated unless the flag 1694 NLM_F_EXCL is specified. 1695 1696The following flags may be specified: 1697[horizontal] 1698NLM_F_CREATE:: Create qdisc if it does not exist, otherwise 1699 -NLE_OBJ_NOTFOUND is returned. 1700NLM_F_REPLACE:: If another qdisc is already attached to the same 1701 parent and their handles mismatch, replace the qdisc 1702 instead of returning -EEXIST. 1703NLM_F_EXCL:: Return -NLE_EXISTS if a qdisc with matching handles 1704 exists already. 1705 1706WARNING: The function rtnl_qdisc_add() requires administrator 1707 privileges. 1708 1709[source,c] 1710------ 1711/* Submit request to kernel and wait for response */ 1712err = rtnl_qdisc_add(sock, qdisc, NLM_F_CREATE); 1713 1714/* Return the qdisc object to free memory resources */ 1715rtnl_qdisc_put(qdisc); 1716 1717if (err < 0) { 1718 fprintf(stderr, "Unable to add qdisc: %s\n", nl_geterror(err)); 1719 return err; 1720} 1721------ 1722 1723==== Deleting a qdisc 1724 1725[source,c] 1726------ 1727#include <netlink/route/qdisc.h> 1728 1729struct rtnl_qdisc *qdisc; 1730 1731qdisc = rtnl_qdisc_alloc(); 1732 1733rtnl_tc_set_link(TC_CAST(qdisc), eth0_obj); 1734rtnl_tc_set_parent(TC_CAST(qdisc), TC_H_ROOT); 1735 1736rtnl_qdisc_delete(sock, qdisc) 1737 1738rtnl_qdisc_put(qdisc); 1739------ 1740 1741WARNING: The function rtnl_qdisc_delete() requires administrator 1742 privileges. 1743 1744 1745[[qdisc_htb]] 1746==== HTB - Hierarchical Token Bucket 1747 1748.HTB Qdisc Attributes 1749 1750Default Class:: 1751The default class is the fallback class to which all traffic which 1752remained unclassified is directed to. If no default class or an 1753invalid default class is specified, packets are transmitted directly 1754to the next layer (direct transmissions). 1755+ 1756[source,c] 1757----- 1758uint32_t rtnl_htb_get_defcls(struct rtnl_qdisc *qdisc); 1759int rtnl_htb_set_defcls(struct rtnl_qdisc *qdisc, uint32_t defcls); 1760----- 1761 1762Rate to Quantum (r2q):: 1763TODO 1764+ 1765[source,c] 1766----- 1767uint32_t rtnl_htb_get_rate2quantum(struct rtnl_qdisc *qdisc); 1768int rtnl_htb_set_rate2quantum(struct rtnl_qdisc *qdisc, uint32_t rate2quantum); 1769----- 1770 1771 1772.HTB Class Attributes 1773 1774Priority:: 1775+ 1776[source,c] 1777----- 1778uint32_t rtnl_htb_get_prio(struct rtnl_class *class); 1779int rtnl_htb_set_prio(struct rtnl_class *class, uint32_t prio); 1780----- 1781 1782Rate:: 1783The rate (bytes/s) specifies the maximum bandwidth an invidivual class 1784can use without borrowing. The rate of a class should always be greater 1785or erqual than the rate of its children. 1786+ 1787[source,c] 1788----- 1789uint32_t rtnl_htb_get_rate(struct rtnl_class *class); 1790int rtnl_htb_set_rate(struct rtnl_class *class, uint32_t ceil); 1791----- 1792 1793Ceil Rate:: 1794The ceil rate specifies the maximum bandwidth an invidivual class 1795can use. This includes bandwidth that is being borrowed from other 1796classes. Ceil defaults to the class rate implying that by default 1797the class will not borrow. The ceil rate of a class should always 1798be greater or erqual than the ceil rate of its children. 1799+ 1800[source,c] 1801----- 1802uint32_t rtnl_htb_get_ceil(struct rtnl_class *class); 1803int rtnl_htb_set_ceil(struct rtnl_class *class, uint32_t ceil); 1804----- 1805 1806Burst:: 1807TODO 1808+ 1809[source,c] 1810----- 1811uint32_t rtnl_htb_get_rbuffer(struct rtnl_class *class); 1812int rtnl_htb_set_rbuffer(struct rtnl_class *class, uint32_t burst); 1813----- 1814 1815Ceil Burst:: 1816TODO 1817+ 1818[source,c] 1819----- 1820uint32_t rtnl_htb_get_bbuffer(struct rtnl_class *class); 1821int rtnl_htb_set_bbuffer(struct rtnl_class *class, uint32_t burst); 1822----- 1823 1824Quantum:: 1825TODO 1826+ 1827[source,c] 1828----- 1829int rtnl_htb_set_quantum(struct rtnl_class *class, uint32_t quantum); 1830----- 1831 1832extern int rtnl_htb_set_cbuffer(struct rtnl_class *, uint32_t); 1833 1834 1835 1836 1837[[tc_class]] 1838=== Class 1839 1840[options="header", cols="s,a,a,a,a"] 1841|======================================================================= 1842| | UNSPEC | TC_H_ROOT | 0:pY | pX:pY 1843| UNSPEC 3+^| 1844[horizontal] 1845qdisc =:: root-qdisc 1846class =:: root-qdisc:0 1847| 1848[horizontal] 1849qdisc =:: pX:0 1850class =:: pX:0 1851| 0:hY 3+^| 1852[horizontal] 1853qdisc =:: root-qdisc 1854class =:: root-qdisc:hY 1855| 1856[horizontal] 1857qdisc =:: pX:0 1858class =:: pX:hY 1859| hX:hY 3+^| 1860[horizontal] 1861qdisc =:: hX: 1862class =:: hX:hY 1863| 1864if pX != hX 1865 return -EINVAL 1866[horizontal] 1867qdisc =:: hX: 1868class =:: hX:hY 1869|======================================================================= 1870 1871[[tc_cls]] 1872=== Classifier (cls) 1873 1874TODO 1875 1876[[tc_classid_mngt]] 1877=== ClassID Management 1878 1879TODO 1880 1881[[tc_pktloc]] 1882=== Packet Location Aliasing (pktloc) 1883 1884TODO 1885 1886[[tc_api]] 1887=== Traffic Control Module API 1888 1889TODO 1890