This documented is slightly dated but should give you idea of how things work. What is it? ----------- An extension to the filtering/classification architecture of Linux Traffic Control. Up to 2.6.8 the only action that could be "attached" to a filter was policing. i.e you could say something like: ----- tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \ 127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k ----- which implies "if a packet is seen on the ingress of the lo device with a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and we execute a policing action which rate limits its bandwidth utilization to 1.5Mbps". The new extensions allow for more than just policing actions to be added. They are also fully backward compatible. If you have a kernel that doesnt understand them, then the effect is null i.e if you have a newer tc but older kernel, the actions are not installed. Likewise if you have a newer kernel but older tc, obviously the tc will use current syntax which will work fine. Of course to get the required effect you need both newer tc and kernel. If you are reading this you have the right tc ;-> A side effect is that we can now get stateless firewalling to work with tc. Essentially this is now an alternative to iptables. I wont go into details of my dislike for iptables at times, but scalability is one of the main issues; however, if you need stateful classification - use netfilter (for now). This stuff works on both ingress and egress qdiscs. Features -------- 1) new additional syntax and actions enabled. Note old syntax is still valid. Essentially this is still the same syntax as tc with a new construct "action". The syntax is of the form: tc filter add parent 1:0 protocol ip prio 10 flowid 1:1 action * You can have as many actions as you want (within sensible reasoning). In the past the only real action was the policer; i.e you could do something along the lines of: tc filter add dev lo parent ffff: protocol ip prio 10 u32 \ match ip src 127.0.0.1/32 flowid 1:1 \ police mtu 4000 rate 1500kbit burst 90k Although you can still use the same syntax, now you can say: tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \ match ip src 127.0.0.1/32 flowid 1:1 \ action police mtu 4000 rate 1500kbit burst 90k " generic Actions" (gact) at the moment are: { drop, pass, reclassify, continue} (If you have others, no listed here give me a reason and we will add them) +drop says to drop the packet +pass and ok (are equivalent) says to accept it +reclassify requests for reclassification of the packet +continue requests for next lookup to match 2)In order to take advantage of some of the targets written by the iptables people, a classifier can have a packet being massaged by an iptable target. I have only tested with mangler targets up to now. (infact anything that is not in the mangling table is disabled right now) In terms of hooks: *ingress is mapped to pre-routing hook *egress is mapped to post-routing hook I dont see much value in the other hooks, if you see it and email me good reasons, the addition is trivial. Example syntax for iptables targets usage becomes: tc filter add ..... u32 action ipt -j example: tc filter add dev lo parent ffff: protocol ip prio 8 u32 \ match ip dst 127.0.0.8/32 flowid 1:12 \ action ipt -j mark --set-mark 2 NOTE: flowid 1:12 is parsed flowid 0x1:0x12. Make sure if you want flowid decimal 12, then use flowid 1:c. 3) A feature i call pipe The motivation is derived from Unix pipe mechanism but applied to packets. Essentially take a matching packet and pass it through action1 | action2 | action3 etc. You could do something similar to this with the tc policer and the "continue" operator but this rather restricts it to just the policer and requires multiple rules (and lookups, hence quiet inefficient); as an example -- and please note that this is just an example _not_ The Word Youve Been Waiting For (yes i have had problems giving examples which ended becoming dogma in documents and people modifying them a little to look clever); i selected the metering rates to be small so that i can show better how things work. The script below does the following: - an incoming packet from 10.0.0.21 is first given a firewall mark of 1. - It is then metered to make sure it does not exceed its allocated rate of 1Kbps. If it doesnt exceed rate, this is where we terminate action execution. - If it does exceed its rate, its "color" changes to a mark of 2 and it is then passed through a second meter. -The second meter is shared across all flows on that device [i am suprised that this seems to be not a well know feature of the policer; Bert was telling me that someone was writing a qdisc just to do sharing across multiple devices; it must be the summer heat again; weve had someone doing that every year around summer -- the key to sharing is to use a operator "index" in your policer rules (example "index 20"). All your rules have to use the same index to share.] -If the second meter is exceeded the color of the flow changes further to 3. -We then pass the packet to another meter which is shared across all devices in the system. If this meter is exceeded we drop the packet. Note the mark can be used further up the system to do things like policy or more interesting things on the egress. ------------------ cut here ------------------------------- # # Add an ingress qdisc on eth0 tc qdisc add dev eth0 ingress # #if you see an incoming packet from 10.0.0.21 tc filter add dev eth0 parent ffff: protocol ip prio 1 \ u32 match ip src 10.0.0.21/32 flowid 1:15 \ # # first give it a mark of 1 action ipt -j mark --set-mark 1 index 2 \ # # then pass it through a policer which allows 1kbps; if the flow # doesnt exceed that rate, this is where we stop, if it exceeds we # pipe the packet to the next action action police rate 1kbit burst 9k pipe \ # # which marks the packet fwmark as 2 and pipes action ipt -j mark --set-mark 2 \ # # next attempt to borrow b/width from a meter # used across all flows incoming on eth0("index 30") # and if that is exceeded we pipe to the next action action police index 30 mtu 5000 rate 1kbit burst 10k pipe \ # mark it as fwmark 3 if exceeded action ipt -j mark --set-mark 3 \ # and then attempt to borrow from a meter used by all devices in the # system. Should this be exceeded, drop the packet on the floor. action police index 20 mtu 5000 rate 1kbit burst 90k drop --------------------------------- Now lets see the actions installed with "tc filter show parent ffff: dev eth0" -------- output ----------- jroot# tc filter show parent ffff: dev eth0 filter protocol ip pref 1 u32 filter protocol ip pref 1 u32 fh 800: ht divisor 1 filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING target MARK set 0x1 index 2 action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING target MARK set 0x2 index 1 action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING target MARK set 0x3 index 3 action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b match 0a000015/ffffffff at 12 ------------------------------- Note the ordering of the actions is based on the order in which we entered them. In the future i will add explicit priorities. Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after you see a few lines of dots ---- [root@jzny hadi]# ping -f 10.0.0.22 PING 10.0.0.22 (10.0.0.22): 56 data bytes .................................................................................................................................................................................................................................................................................................................................................................................................................................................... --- 10.0.0.22 ping statistics --- 2248 packets transmitted, 1811 packets received, 19% packet loss round-trip min/avg/max = 0.7/9.3/20.1 ms ----------------------------- Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0" -------------- jroot# tc -s filter show parent ffff: dev eth0 filter protocol ip pref 1 u32 filter protocol ip pref 1 u32 fh 800: ht divisor 1 filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 5 action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING target MARK set 0x1 index 2 Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0) action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122) action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING target MARK set 0x2 index 1 Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0) action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945) action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING target MARK set 0x3 index 3 Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0) action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437) match 0a000015/ffffffff at 12 ------------------------------- Neat, eh? Wanna write an action module? ------------------------------ Its easy. Either look at the code or send me email. I will document at some point; will also accept documentation. TODO ---- Lotsa goodies/features coming. Requests also being accepted. At the moment the focus has been on getting the architecture in place. Expect new things in the spurious time i have to work on this (particularly around end of year when i have typically get time off from work).