History log of /freebsd-current/sys/netpfil/ipfw/ip_fw_dynamic.c
Revision Date Author Comments
# 938918a9 20-Jan-2024 Gordon Bergling <gbe@FreeBSD.org>

netpfil: Fix two typos in source code comments

- s/withing/within/

MFC after: 3 days


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 4d89e201 03-Sep-2022 Gordon Bergling <gbe@FreeBSD.org>

netpfil: Correct some typos in source code comments

- s/occured/occurred/
- s/the the/the/

MFC after: 3 days


# 0dff875f 18-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ipfw: remove unnecessary TCP related includes


# 9bacbf1a 16-Apr-2021 Andrey V. Elsukov <ae@FreeBSD.org>

ipfw: do not use sleepable malloc in callout context.

Use M_NOWAIT flag when hash growing is called from callout.

PR: 255041
Reviewed by: kevans
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D29772


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# b4426a71 11-Feb-2020 Hans Petter Selasky <hselasky@FreeBSD.org>

Add missing EPOCH(9) wrapper in ipfw(8).

Backtrace:
panic()
ip_output()
dyn_tick()
softclock_call_cc()
softclock()
ithread_loop()

Differential Revision: https://reviews.freebsd.org/D23599
Reviewed by: glebius@ and ae@
Found by: mmacy@
Reported by: jmd@
Sponsored by: Mellanox Technologies


# b7795b67 14-Mar-2019 Gleb Smirnoff <glebius@FreeBSD.org>

- Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and
IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir"
parameter that is often passes together with args.
- Rename ip_fw_args.oif to ifp and now it is set to either input or
output interface, depending on IPFW_ARGS_IN/OUT bit set.


# 83354acf 06-Mar-2019 Andrey V. Elsukov <ae@FreeBSD.org>

Fix the problem with O_LIMIT states introduced in r344018.

dyn_install_state() uses `rule` pointer when it creates state.
For O_LIMIT states this pointer actually is not struct ip_fw,
it is pointer to O_LIMIT_PARENT state, that keeps actual pointer
to ip_fw parent rule. Thus we need to cache rule id and number
before calling dyn_get_parent_state(), so we can use them later
when the `rule` pointer is overrided.

PR: 236292
MFC after: 3 days


# 804a6541 11-Feb-2019 Andrey V. Elsukov <ae@FreeBSD.org>

Remove `set' field from state structure and use set from parent rule.

Initially it was introduced because parent rule pointer could be freed,
and rule's information could become inaccessible. In r341471 this was
changed. And now we don't need this information, and also it can become
stale. E.g. rule can be moved from one set to another. This can lead
to parent's set and state's set will not match. In this case it is
possible that static rule will be freed, but dynamic state will not.
This can happen when `ipfw delete set N` command is used to delete
rules, that were moved to another set.
To fix the problem we will use the set number from parent rule.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC


# f712b161 31-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Revert r316461: Remove "IPFW static rules" rmlock, and use pfil's global lock.

The pfil(9) system is about to be converted to epoch(9) synchronization, so
we need [temporarily] go back with ipfw internal locking.

Discussed with: ae


# d66f9c86 04-Dec-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add ability to request listing and deleting only for dynamic states.

This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but
after rules reloading some state must be deleted. Added new flag '-D'
for such purpose.

Retire '-e' flag, since there can not be expired states in the meaning
that this flag historically had.

Also add "verbose" mode for listing of dynamic states, it can be enabled
with '-v' flag and adds additional information to states list. This can
be useful for debugging.

Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC


# cefe3d67 04-Dec-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Reimplement how net.inet.ip.fw.dyn_keep_states works.

Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".

Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.

The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.

Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.

ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.

ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.

External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.

Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532


# 0df76496 04-Dec-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add assertion to check that named object has correct type.

Obtained from: Yandex LLC
MFC after: 1 week


# 2636ba4d 27-Nov-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Do not limit the mbuf queue length for keepalive packets.

It was unlimited before overhaul, and one user reported that this limit
can be reached easily.

PR: 233562
MFC after: 1 week


# ab108c4b 21-Oct-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Do not decrement RST life time if keep_alive is not turned on.

This allows use differen values configured by user for sysctl variable
net.inet.ip.fw.dyn_rst_lifetime.

Obtained from: Yandex LLC
MFC after: 3 weeks
Sponsored by: Yandex LLC


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# 2bf95012 05-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Create a new macro for static DPCPU data.

On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by: bz, emaste, markj
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16140


# 67ad3c0b 22-May-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Restore the ability to keep states after parent rule deletion.

This feature is disabled by default and was removed when dynamic states
implementation changed to be lockless. Now it is reimplemented with small
differences - when dyn_keep_states sysctl variable is enabled,
dyn_match_ipv[46]_state() function doesn't match child states of deleted
rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state()
function does check that state was not orphaned, and if so, it returns
pointer to default_rule and its position in the rules map. The main visible
difference is that orphaned states still have the same rule number that
they have before parent rule deleted, because now a state has many fields
related to rule and changing them all atomically to point to default_rule
seems hard enough.

Reported by: <lantw44 at gmail.com>
MFC after: 2 days


# 4bb8a5b0 21-May-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Remove check for matching the rulenum, ruleid and rule pointer from
dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not
ready to be committed code, and they are there by accident.
Due to the race these checks can lead to creating of duplicate states
when concurrent threads in the same time will try to add state for two
packets of the same flow, but in reverse directions and matched by
different parent rules.

Reported by: lev
MFC after: 3 days


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# 99493f5a 07-Feb-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Remove duplicate #include <netinet/ip_var.h>.


# b99a6823 07-Feb-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Rework ipfw dynamic states implementation to be lockless on fast path.

o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.

Obtained from: Yandex LLC
MFC after: 42 days
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685


# d3834420 18-Jan-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add UDPLite support to ipfw(4).

Now it is possible to use UDPLite's port numbers in rules,
create dynamic states for UDPLite packets and see "UDPLite" for matched
packets in log.

Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 1719df1b 23-Nov-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Modify ipfw's dynamic states KPI.

Hide the locking logic used in the dynamic states implementation from
generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule()
function to have similar names: ipfw_dyn_install_state() and
ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the
ipfw_dyn_lookup_state() function. Now this function return NULL when
there is no state and pointer to the parent rule when state is found.
Thus now there is no need to return pointer to dynamic rule, and no need
to hold bucket lock for this state. Remove ipfw_dyn_unlock() function.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D11657


# 9d155400 23-Nov-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Check that address family of state matches address family of packet.

If it is not matched avoid comparing other state fields.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC


# 30df59d5 22-Nov-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Move ipfw_send_pkt() from ip_fw_dynamic.c into ip_fw2.c.
It is not specific for dynamic states function and called also from
generic code.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC


# 369bc48d 20-Sep-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Do not acquire IPFW_WLOCK when a named object is created and destroyed.

Acquiring of IPFW_WLOCK is requried for cases when we are going to
change some data that can be accessed during processing of packets flow.
When we create new named object, there are not yet any rules, that
references it, thus holding IPFW_UH_WLOCK is enough to safely update
needed structures. When we destroy an object, we do this only when its
reference counter becomes zero. And it is safe to not acquire IPFW_WLOCK,
because noone references it. The another case is when we failed to finish
some action and thus we are doing rollback and destroying an object, in
this case it is still not referenced by rules and no need to acquire
IPFW_WLOCK.

This also fixes panic with INVARIANTS due to recursive IPFW_WLOCK acquiring.

MFC after: 1 week
Sponsored by: Yandex LLC


# 1ca7c3b8 14-Apr-2017 Andrey V. Elsukov <ae@FreeBSD.org>

The rule field in the ipfw_dyn_rule structure is used as storage
to pass rule number and rule set to userland. In r272840 the kernel
internal rule representation was changed and the rulenum field of
struct ip_fw_rule got the type uint32_t, but userlevel representation
still have the type uint16_t. To not overflow the size of pointer
on the systems with 32-bit pointer size use separate variable to
copy rulenum and set.

Reported by: PVS-Studio
MFC after: 1 week


# f91eb6ad 13-Apr-2017 Maxim Konovalov <maxim@FreeBSD.org>

o Redundant assignments removed.

Found by: PVS-Stdio, V519
Reviewed by: ae


# 88d950a6 03-Apr-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Remove "IPFW static rules" rmlock.

Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.

Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
No objection from: #network
Differential Revision: https://reviews.freebsd.org/D10154


# 02784f10 06-Dec-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Convert result of hash_packet6() into host byte order.

For IPv4 similar function uses addresses and ports in host byte order,
but for IPv6 it used network byte order. This led to very bad hash
distribution for IPv6 flows. Now the result looks similar to IPv4.

Reported by: olivier
MFC after: 1 week
Sponsored by: Yandex LLC


# ed22e564 18-Jul-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Add named dynamic states support to ipfw(4).

The keep-state, limit and check-state now will have additional argument
flowname. This flowname will be assigned to dynamic rule by keep-state
or limit opcode. And then can be matched by check-state opcode or
O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize
compatibility with old rulesets default flowname introduced.
It will be assigned to the rules when user has omitted state name in
keep-state and check-state opcodes. Also if name is ambiguous (can be
evaluated as rule opcode) it will be replaced to default.

Reviewed by: julian
Obtained from: Yandex LLC
MFC after: 1 month
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6674


# d16f495c 20-May-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Fix the regression introduced in r300143.
When we are creating new dynamic state use MATCH_FORWARD direction to
correctly initialize protocol's state.


# 96e84c57 17-May-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Move protocol state handling code from lookup_dyn_rule_locked() function
into dyn_update_proto_state(). This allows eliminate the second state
lookup in the ipfw_install_state().
Also remove MATCH_* macros, they are defined in ip_fw_private.h as enum.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# a4641f4e 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net*: minor spelling fixes.

No functional change.


# 657592fd 03-Mar-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Use correct size for malloc.

Obtained from: Yandex LLC
MFC after: 1 week


# 71433037 17-Nov-2015 Bryan Drewery <bdrewery@FreeBSD.org>

Fix dynamic IPv6 rules showing junk for non-specified address masks.

For example:
00002 0 0 (19s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
00002 4 412 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
00002 10 777 (1s) LIMIT tcp 2001:894:5a24:653::503:1 52023 <-> 2001:894:5a24:653:ca0a:a9ff:fe04:3978 22
00002 0 0 (17s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> 80f3:70d:23fe:ffff:1005:: 0

Fix this by zeroing the unused address, as is done for IPv4:
00002 0 0 (18s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
00002 36 14952 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
00002 0 0 (0s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> :: 0
00002 4 345 (274s) LIMIT tcp 2001:894:5a24:653::503:1 34131 <-> 2001:470:1f11:262:ca0a:a9ff:fe04:3978 22

MFC after: 2 weeks


# fd90e2ed 22-May-2015 Jung-uk Kim <jkim@FreeBSD.org>

CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks


# 6df8a710 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


# 257480b8 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert netinet6/ to use new routing API.

* Remove &ifpp from ip6_output() in favor of ri->ri_nh_info
* Provide different wrappers to in6_selectsrc:
Currently it is used by 2 differenct type of customers:
- socket-based one, which all are unsure about provided
address scope and
- in-kernel ones (ND code mostly), which don't have
any sockets, options, crededentials, etc.
So, we provide two different wrappers to in6_selectsrc()
returning select source.
* Make different versions of selectroute():
Currenly selectroute() is used in two scenarios:
- SAS, via in6_selecsrc() -> in6_selectif() -> selectroute()
- output, via in6_output -> wrapper -> selectroute()
Provide different versions for each customer:
- fib6_lookup_nh_basic()-based in6_selectif() which is
capable of returning interface only, without MTU/NHOP/L2
calculations
- full-blown fib6_selectroute() with cached route/multipath/
MTU/L2
* Stop using routing table for link-local address lookups
* Add in6_ifawithifp_lla() to make for-us check faster for link-local
* Add in6_splitscope / in6_setllascope for faster embed/deembed scopes


# 552eb491 24-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Bump default dynamic limit to 16k entries.
Print better log message when limit is hit.

PR: 193300
Submitted by: me at nileshgr.com


# ccba94b8 04-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Switch ipfw to use rmlock for runtime locking.


# 0cba2b28 31-Aug-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.

Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.

Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..

Some examples:

3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1


# 1940fa77 12-Aug-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.

The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.

This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.


# 8bd19212 08-Aug-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Partially revert previous commit:
"0" value is perfectly valid for O_SETFIB and O_SETDSCP,
so tablearg remains to be 655535 for now.


# 2c452b20 08-Aug-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Switch tablearg value from 65535 to 0.
* Use u16 table kidx instead of integer on for iface opcode.
* Provide compability layer for old clients.


# a73d728d 07-Aug-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Kernel changes:
* Implement proper checks for switching between global and set-aware tables
* Split IP_FW_DEL mess into the following opcodes:
* IP_FW_XDEL (del rules matching pattern)
* IP_FW_XMOVE (move rules matching pattern to another set)
* IP_FW_SET_SWAP (swap between 2 sets)
* IP_FW_SET_MOVE (move one set to another one)
* IP_FW_SET_ENABLE (enable/disable sets)
* Add IP_FW_XZERO / IP_FW_XRESETLOG to finish IP_FW3 migration.
* Use unified ipfw_range_tlv as range description for all of the above.
* Check dynamic states IFF there was non-zero number of deleted dyn rules,
* Del relevant dynamic states with singe traversal instead of per-rule one.

Userland changes:
* Switch ipfw(8) to use new opcodes.


# 6447bae6 06-Jul-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Prepare to pass other dynamic states via ipfw_dump_config()

Kernel changes:
* Change dump format for dynamic states:
each state is now stored inside ipfw_obj_dyntlv
last dynamic state is indicated by IPFW_DF_LAST flag
* Do not perform sooptcopyout() for !SOPT_GET requests.

Userland changes:
* Introduce foreach_state() function handler to ease work
with different states passed by ipfw_dump_config().


# 563b5ab1 28-Jun-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Suppord showing named tables in ipfw(8) rule listing.

Kernel changes:
* change base TLV header to be u64 (so size can be u32).
* Introduce ipfw_obj_ctlv generc container TLV.
* Add IP_FW_XGET opcode which is now used for atomic configuration
retrieval. One can specify needed configuration pieces to retrieve
via flags field. Currently supported are
IPFW_CFG_GET_STATIC (static rules) and
IPFW_CFG_GET_STATES (dynamic states).
Other configuration pieces (tables, pipes, etc..) support is planned.

Userland changes:
* Switch ipfw(8) to use new IP_FW_XGET for rule listing.
* Split rule listing code get and show pieces.
* Make several steps forward towards libipfw:
permit printing states and rules(paritally) to supplied buffer.
do not die on malloc/kernel failure inside given printing functions.
stop assuming cmdline_opts is global symbol.


# a3043eee 14-Feb-2014 Dimitry Andric <dim@FreeBSD.org>

Under sys/netpfil/ipfw, surround two IPv6-specific static functions with
#ifdef INET6, since they are unused when INET6 is disabled.

MFC after: 3 days


# fb2b51fa 18-Dec-2013 Alexander V. Chernikov <melifaro@FreeBSD.org>

Add net.inet.ip.fw.dyn_keep_states sysctl which
re-links dynamic states to default rule instead of
flushing on rule deletion.
This can be useful while performing ruleset reload
(think about `atomic` reload via changing sets).
Currently it is turned off by default.

MFC after: 2 weeks
Sponsored by: Yandex LLC


# 413c8aaa 21-Nov-2013 Luigi Rizzo <luigi@FreeBSD.org>

more support for userspace compiling of this code:
emulate the uma_zone for dynamic rules.


# c3322cb9 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 39bddcde 02-Mar-2013 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix callout expiring dynamic rules.

PR: kern/175530
Submitted by: Vladimir Spiridenkov <vs@gtn.ru>
MFC after: 2 weeks


# f37de965 23-Dec-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by: Vitaliy Tokarenko <rphone@ukr.net>
MFC after: 2 weeks


# eb1b1807 05-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# c187c1fb 30-Nov-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.

MFC after: 3 weeks


# 2e089d5c 30-Nov-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make ipfw dynamic states operations SMP-ready.

* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.

Discussed with: ipfw
MFC after: 3 weeks
Sponsored by: Yandex LLC


# 73332e7c 09-Nov-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify sending keepalives.
Prepare ipfw_tick() to be used by other consumers.

Reviewed by: ae(basically)
MFC after: 2 weeks


# a730ff05 05-Nov-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use unified print_dyn_rule_flags() function for debugging messages
instead of hand-made printfs in every place.

MFC after: 1 week


# 8f134647 22-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


# 3b3a8eb9 14-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi