History log of /freebsd-current/sys/net/pfvar.h
Revision Date Author Comments
# 9dbbe68b 30-May-2024 Kristof Provost <kp@FreeBSD.org>

pf: convert DIOCCLRSTATUS to netlink

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 6ee3e376 24-May-2024 Kristof Provost <kp@FreeBSD.org>

pf: fix incorrect anchor_call to userspace

777a4702c changed how we copy out the anchor_call string, and
incorrectly limited it to 8 (4 on 32-bit systems) bytes. Fix that so we
get the full anchor path, rather than just the first few characters.

PR: 279225
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 706d465d 26-Feb-2024 Kristof Provost <kp@FreeBSD.org>

pf: convert kill/clear state to use netlink

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D44090


# 777a4702 12-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pf: implement addrule via netlink

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 54c62e3e 17-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pf: work around icmp6 packet-too-big not being sent when binat-ing

If we're applying NPTv6 we pass a packet with a modified source and/or
destination address to the network stack.

If that packet then turns out to be larger than the MTU of the sending
interface the stack will attempt to generate an icmp6 packet-too-big
error, but may fail to look up the appropriate source address for that
error message. Even if it does, pf would still have to undo the binat
operation inside the icmp6 packet so the sending host can make sense of
the error.

We can avoid both problems entirely by having pf also perform the MTU
check (taking the potential refragmentation into account), and
generating the icmp6 error directly in pf.

See also: https://redmine.pfsense.org/issues/14290
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43499


# 04932601 07-Dec-2023 Kristof Provost <kp@FreeBSD.org>

pf: store state creation/expiration timestamps with milisecond precision

The primary beneficiary is pflow(4), which expects milisecond precision
in timestamps.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43112


# baf9b6d0 01-Dec-2023 Kristof Provost <kp@FreeBSD.org>

pf: allow pflow to be activated per rule

Only generate ipfix/netflow reports (through pflow) for the rules where
this is enabled. Reports can also be enabled globally through 'set
state-default pflow'.

Obtained from: OpenBSD
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43108


# f92d9b1a 28-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pflow: import from OpenBSD

pflow is a pseudo device to export flow accounting data over UDP.
It's compatible with netflow version 5 and IPFIX (10).

The data is extracted from the pf state table. States are exported once
they are removed.

Reviewed by: melifaro
Obtained from: OpenBSD
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43106


# 948e8413 02-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pflog: pass the action to pflog directly

If a packet is malformed, it is dropped by pf(4). The rule referenced
in pflog(4) is the default rule. As the default rule is a pass
rule, tcpdump printed "pass" although the packet was actually
dropped. Use the actual action, rather than the rule's action, or an
attempt at guessing the correct action.

Inspired by OpenBSD's 'pflog(4) logs packet dropped by default rule with block.' commit.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 44f323ec 24-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: implement DIOCGETRULES via netlink

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 7093414c 17-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: sctp heartbeats confirm a connection

When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) the new connection will
never see a COOKIE/COOKIE_ACK exchange. We should consider HEARTBEAT_ACK
to be a confirmation that the connection is established.

This ensures that such connections do not time out earlier than
expected.

MFC after: 1 week
Sponsored by: Orange Business Services


# 586fae3c 30-Oct-2023 Kristof Provost <kp@FreeBSD.org>

Revert "pf: remove COMPAT_FREEBSD14 #ifdef from pfvar.h"

This reverts commit 9eff6390718d0fa67dffc6cd830b0bc6b815e8c4.

The libpfctl port has been fixed (to avoid using DIOCGETSTATESV2), so we
can now safely revert this.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 4f337550 19-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: allow states to be killed by their pre-NAT address

If a connection is NAT-ed we could previously only terminate it by its
ID or the post-NAT IP address. Allow users to specify they want look for
the state by its pre-NAT address. Usage: `pfctl -k nat -k <address>`.

See also: https://redmine.pfsense.org/issues/11556
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42312


# ffbf2595 14-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: convert rule addition to netlink

The nvlist-based version will be removed in FreeBSD 16.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42279


# 9eff6390 18-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: remove COMPAT_FREEBSD14 #ifdef from pfvar.h

When userspace includes pfvar.h it doesn't get the kernel's COMPAT_*
defines, so we end up not having required symbols in userspace. This
caused the libpfctl port to fail to build.

libpfctl will be updated to use the new netlink-based state export code
soon, which will also fix thix build issue.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 81647eb6 10-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: implement start/stop calls via netlink

Implement equivalents to DIOCSTART and DIOCSTOP in netlink. Provide a
libpfctl implementation and add a basic test case, mostly to verify that
we still return the same errors as before the conversion

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42145


# ebfd3b22 06-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: move DIOCGETSTATES(V2) to COMPAT_FREEBSD14

We now have an improved version (via netlink). The old-style ioctl will
be removed in FreeBSD 16.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42101


# 4d3af82f 05-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: mark removed connections within a multihome association as shutting down

Parse IP removal in ASCONF chunks, find the affected state(s) and mark
them as shutting down. This will cause them to time out according to
PFTM_TCP_CLOSING timeouts, rather than waiting for the established
session timeout.

MFC after: 3 weeks
Sponsored by: Orange Business Services


# 51a78dd2 01-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: improve SCTP state validation

Only create new states for INIT chunks, or when we're creating a
secondary state for a multihomed association.

Store and verify verification tag.

MFC after: 3 weeks
Sponsored by: Orange Business Services


# 10aa9ddb 02-Aug-2023 Kristof Provost <kp@FreeBSD.org>

pf: support SCTP multihoming

SCTP may announce additional IP addresses it'll use in the INIT/INIT_ACK
chunks, or in ASCONF chunks at any time during the connection. Parse these
parameters, evaluate the ruleset for the new connection and if allowed
create the corresponding states.

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D41637


# 8d49fd73 29-Aug-2023 Kristof Provost <kp@FreeBSD.org>

pf: remove DIOCGETRULE and DIOCGETSTATUS

These calls have nvlist variants that completely supersede them.
Remove the old code.

Reviewed by: mjg
MFC after: never
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D41651


# 2e8edbc2 28-Aug-2023 Kristof Provost <kp@FreeBSD.org>

pf: Remove DIOCCLRSTATES and DIOCKILLSTATES

These now have nvlist based alternatives, so remove them.

Reviewed by: mjg, Pau Amma <pauamma@gundo.com> (man page)
MFC after: never
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30056


# d10de21f 24-Aug-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Access r->rpool.cur->kif under mutex protection

pf_route() sends traffic to a specified next hop over a specific
interface. The next hop is obtained in pf_map_addr() but the interface
is obtained directly via r->rpool.cur->kif` outside of the lock held in
pf_map_addr() in multiple places around pf. The chosen interface is not
stored in source node.

Move the interface selection into pf_map_addr(), have the function
return it together with the chosen IP address and ensure its stored
in struct pf_ksrc_node, store it in the source node and use the stored
value when needed.

Sponsored by: InnoGames GmbH
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41570


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# d1bc1e9e 31-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: support 'return' for SCTP

Send an SCTP Abort message if we're refusing a connection, just like we
send a RST for TCP.

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40864


# 010ee43f 27-Apr-2023 Kristof Provost <kp@FreeBSD.org>

pf: initial SCTP support

Basic state tracking for SCTP. This means we scan through the packet to
identify the different chunks (so we can identify state changes).

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40862


# 6b4ed16d 12-Jul-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Simplify rule actions logic

Actions applied to a processed packet come in case of stateless
firewalling from a rule or in case of statefull firewalling from a
state. The state obtains the actions from a rule when it is created by a
rule or by pfsync. The logic for deciding if actions come from a rule or
a state is spread across many places in pf.

There already is struct pf_rule_actions in struct pf_pdesc and thus it
can be used as a central place for storing actions and their parameters.
OpenBSD does something similar: they also store the actions in struct
pf_pdesc and have no variables in pf_test() but they use separate
variables instead of a structure. By using struct pf_rule_actions we can
simplify the code even further. Applying of actions is done *only* in
pf_rule_to_actions() no matter if for the legacy scrub rules or for the
normal match / pass rules. The logic of choosing if rule or state
actions are used is applied only once in pf_test() by copying the whole
struct.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D41009


# f2064dd1 12-Jul-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Fix duplicate storage of direction

The variable storing the direction of a processed packet is passed
around to many functions. Most of those functions already have a pointer
to struct pf_pdesc which also contains the direction. By using the one
in struct pf_pdesc we can reduce the amount of arguments passed around.

Reviewed by: kp
Sponsored by: InnGames GmbH
Differential Revision: https://reviews.freebsd.org/D41008


# 7dc3be36 19-Jun-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Fix usage of pf tags with syncookies

The value stored in pf_mtag->tag comes from "tag" and "match tag"
keywords in pf.conf and must not be abused for storing other
information. A ruleset with enough tags could set or remove the bits
responsible for PF_TAG_SYNCOOKIE_RECREATED.

Move this syncookie status to pf_mtag->flags. Rename this and other
related constants in a way that will prevent such mistakes in the
future. Move PF_REASSEMBLED constant to mbuf.h and rename accordingly
because it's not a flag stored in pf_mtag, but an identifier of a
different m_tag. Change the value of the constant to avoid conflicts
with other m_tags using MTAG_ABI_COMPAT.

Rename the variables in pf_build_tcp() and pf_send_tcp() in to reduce
confusion.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D40587


# ba94bf28 15-Jun-2023 Kristof Provost <kp@FreeBSD.org>

pf: extend use of skip steps for Ethernet rules

Use the already populated PFE_SKIP_DST_ADDR and extend the skip
infrastructure to also skip on IP source/destination addresses.

This should make evaluating the rules slightly faster.

Reported by: R. Christian McDonald <rcm@rcm.sh>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40567


# 9925aee0 30-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: carry over rule actions from route-to rules

If we route-to (or dup-to/reply-to) we re-run pf_test(), which will also
create states for the connection.
This means that we may end up matching a different (i.e. not the state
that was created by the route-to rule) state, without the attributes
(such as dummynet pipes/queues) set by the route-to rule.

Address this by inheriting the pf_rule_actions from the route-to rule
while evaluating the connection again in pf_test(). That is, we set
default pf_rule_actions based on the route-to rule for the new
evaluation. The new rule may still overrule these, but if it does not
have such actions the route-to actions are applied.

Do the same for IPv6 rules in pf_test6()/pf_route6().

See also: https://redmine.pfsense.org/issues/14039
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40340


# c45d6b0e 29-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pfctl: Add missing state parameters in DIOCGETSTATESV2

Reviewed by: kp
Sponsored by: InnoGames GmbH
Different Revision: https://reviews.freebsd.org/D40259


# 4bf98559 29-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: make contents of struct pfsync_state configurable

Make struct pfsync_state contents configurable by sending out new
versions of the structure in separate subheader actions. Both old and
new version of struct pfsync_state can be understood, so replication of
states from a system running an older kernel is possible. The version
being sent out is configured using ifconfig pfsync0 … version XXXX. The
version is an user-friendly string - 1301 stands for FreeBSD 13.1 (I
have checked synchronization against a host running 13.1), 1400 stands
for 14.0.

A host running an older kernel will just ignore the messages and count
them as "packets discarded for bad action".

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39392


# c4a32455 19-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: remove the use of caddr_t

Replace caddr_t with void *, or more accurate types.

Suggested by: glebius
Reviewed by: zlei
Differential Revision: https://reviews.freebsd.org/D40186


# 8c23afdb 17-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Standardize rtableid

Prepare for rtableid being included in struct pfsync_state where it will
be int32_t. Make variables which will be set to and from it the same
width.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D40013


# 8216f1a9 05-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: fix a few more prototypes

Fix function prototypes to use the same type for sa_family_t as the
definition.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 16303d2b 03-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: improve source node error handling

Functions manipulating source nodes can fail due to various reasons like
memory allocation errors, hitting configured limits or lack of
redirection targets. Ensure those errors are properly caught and
propagated in the code. Increase the error counters not only when
parsing the main ruleset but the NAT ruleset too.

Cherry-picked from development of D39880

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39940


# 7b676698 03-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: simplify structs with anonymous unions

Rather than playing preprocessor hacks use actual anonymous unions.
No functional change.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# db0a2bfd 01-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: reduce number of hashing operations when handling source nodes

Reduce number of hashing operations when handling source nodes by always
having a pointer to the hash row mutex in the source node. Provide
macros for handling and asserting the mutex. Calculate the hash only
once in pf_find_src_node() and then use this hash in subsequent
operations.

Cherry-picked from development of D39880

Reviewed by: kp, mjg
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39888


# ef661d4a 24-Apr-2023 Christian McDonald <cmcdonald@netgate.com>

pf: introduce ridentifier and labels to ether rules

Make Ethernet rules more similar to the usual layer 3 rules by also
allowing ridentifier and labels to be set on them.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 2e6cdfe2 18-Apr-2023 Kristof Provost <kp@FreeBSD.org>

pf: change pf_rules_lock and pf_ioctl_lock to per-vnet locks

Both pf_rules_lock and pf_ioctl_lock only ever affect one vnet, so
there's no point in having these locks affect other vnets.
(In fact, the only lock in pf that can affect multiple vnets is
pf_end_lock.)

That's especially important for the rules lock, because taking the write
lock suspends all network traffic until it's released. This will reduce
the impact a vnet running pf can have on other vnets, and improve
concurrency on machines running multiple pf-enabled vnets.

Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39658


# af94d8cc 18-Apr-2023 Kristof Provost <kp@FreeBSD.org>

pf: fix incorrect lock define

PF_TABLE_STATS_ASSERT() should be checking pf_table_stats_lock not
pf_rules_lock.

Fortunately the define is not yet used anywhere so this was harmless.
Fix it anyway, in case it does get used.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 39282ef3 13-Apr-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: backport OpenBSD syntax of "scrub" option for "match" and "pass" rules

Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.

Obtained from: OpenBSD
MFC after: never
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38025


# b52b61c0 12-Mar-2023 Kristof Provost <kp@FreeBSD.org>

pf: distinguish forwarding and output cases for pf_refragment6()

Re-introduce PFIL_FWD, because pf's pf_refragment6() needs to know if
we're ip6_forward()-ing or ip6_output()-ing.

ip6_forward() relies on m->m_pkthdr.rcvif, at least for link-local
traffic (for in6_get_unicast_scopeid()). rcvif is not set for locally
generated traffic (e.g. from icmp6_reflect()), so we need to call the
correct output function.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revisi: https://reviews.freebsd.org/D39061


# 9c041b45 31-Dec-2022 Kristof Provost <kp@FreeBSD.org>

pf: fix syncookies in conjunction with tcp fast port reuse

Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.

Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.

So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.

Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919


# 8a8af942 22-Sep-2022 Kristof Provost <kp@FreeBSD.org>

pf: bridge-to

Allow pf (l2) to be used to redirect ethernet packets to a different
interface.

The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37193


# 133935d2 07-Oct-2022 Kristof Provost <kp@FreeBSD.org>

pf: atomically increment state ids

Rather than using a per-cpu state counter, and adding in the CPU id we
can atomically increment the number.
This has the advantage of removing the assumption that the CPU ID fits
in 8 bits.

Event: Aberdeen Hackathon 2022
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36915


# 1d090028 29-Sep-2022 Kristof Provost <kp@FreeBSD.org>

pf: use time_to for timestamps

Use time_t rather than uint32_t to represent the timestamps. That means
we have 64 bits rather than 32 on all platforms except i386, avoiding
the Y2K38 issues on most platforms.

Reviewed by: Zhenlei Huang
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36837


# 485be979 22-Aug-2022 Luiz Amaral <email@luiz.eng.br>

pfsync: replace struct pfsync_pkt with int flags

Get rid of struct pfsync_pkt. It was used to store data on the stack to
pass to all the submessage handlers, but only the flags part of it was
ever used. Just pass the flags directly instead.

Reviewed by: kp
Obtained from: OpenBSD
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D36294


# 1f61367f 31-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: support matching on tags for Ethernet rules

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35362


# 0abcc1d2 22-Apr-2022 Reid Linnemann <rlinnemann@netgate.com>

pf: Add per-rule timestamps for rule and eth_rule

Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34970


# 812839e5 12-Apr-2022 Kristof Provost <kp@FreeBSD.org>

pf: allow the use of tables in ethernet rules

Allow tables to be used for the l3 source/destination matching.
This requires taking the PF_RULES read lock.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34917


# 9bb06778 29-Mar-2022 Kristof Provost <kp@FreeBSD.org>

pf: support listing ethernet anchors

Sponsored by: Rubicon Communications, LLC ("Netgate")


# bd7762c8 28-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: add a rule rb tree

with md5 sum used as key.

This gets rid of the quadratic rule traversal when "keep_counters" is
set.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 1a3e98a5 25-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: pre-compute rule hash

Makes it cheaper to compare rules when "keep_counters" is set.
This also sets up keeping them in a RB tree.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 93f8c38c 25-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_config_lock

For now only protects rule creation/destruction, but will allow
gradually reducing the scope of rules lock when changing the
rules.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# ef88adc5 27-Mar-2022 Gordon Bergling <gbe@FreeBSD.org>

pf(4): Fix a typo in a source code comment

- s/seaching/searching/

MFC after: 3 days


# 8a42005d 08-Mar-2022 Kristof Provost <kp@FreeBSD.org>

pf: support basic L3 filtering in the Ethernet rules

Allow filtering based on the source or destination IP/IPv6 address in
the Ethernet layer rules.

Reviewed by: pauamma_gundo.com (man), debdrup (man)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34482


# f11b6505 28-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: add PF_UNLNKDRULES_ASSERT

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# b590f17a 20-Jan-2022 Kristof Provost <kp@FreeBSD.org>

pf: support masking mac addresses

When filtering Ethernet packets allow rules to specify a mac address
with a mask. This indicates which bits of the specified address are
significant. This allows users to do things like filter based on device
manufacturer.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# c5131afe 01-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: add anchor support for ether rules

Support anchors in ether rules.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32482


# fb330f39 27-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: support dummynet on L2 rules

Allow packets to be tagged with dummynet information. Note that we do
not apply dummynet shaping on the L2 traffic, but instead mark it for
dummynet processing in the L3 code. This is the same approach as we take
for ALTQ.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32222


# 20c4899a 10-Feb-2021 Kristof Provost <kp@FreeBSD.org>

pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules

Avoid the overhead of acquiring a (read) RULES lock when processing the
Ethernet rules.
We can get away with that because when rules are modified they're staged
in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is
atomic, so that pf_test_eth_rule() always sees either the old rules, or
the new ruleset.

We need to take care not to delete the old ruleset until we're sure no
pf_test_eth_rule() is still running with those. We accomplish that by
using NET_EPOCH_CALL() to actually free the old rules.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31739


# e732e742 03-Feb-2021 Kristof Provost <kp@FreeBSD.org>

pf: Initial Ethernet level filtering code

This is the kernel side of stateless Ethernel level filtering for pf.

The primary use case for this is to enable captive portal functionality
to allow/deny access by MAC address, rather than per IP address.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31737


# 773e3a71 31-Jan-2022 Mark Johnston <markj@FreeBSD.org>

pf: Initialize pf_kpool mutexes earlier

There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.

Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there. This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.

Constify some related functions while here and add a regression test
based on a syzkaller reproducer.

Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34115


# 5f5e32f1 10-Jan-2022 Kristof Provost <kp@FreeBSD.org>

pf: protect the rpool from races

The roundrobin pool stores its state in the rule, which could
potentially lead to invalid addresses being returned.

For example, thread A just executed PF_AINC(&rpool->counter) and
immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter)
(i.e. after the pf_match_addr() check of rpool->counter).

Lock the rpool with its own mutex to prevent these races. The
performance impact of this is expected to be low, as each rule has its
own lock, and the lock is also only relevant when state is being created
(so only for the initial packets of a connection, not for all traffic).

See also: https://redmine.pfsense.org/issues/12660
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33874


# c658610b 15-Dec-2021 Kristof Provost <kp@FreeBSD.org>

pf: make pfvar.h self-contained

Ensure that the pfvar.h header can be included without including any
other headers.

Reviewed by: imp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33499


# 8e492101 15-Nov-2021 Kristof Provost <kp@FreeBSD.org>

pf: add COMPAT_FREEBSD13 for DIOCKEEPCOUNTERS

DIOCKEEPCOUNTERS used to overlap with DIOCGIFSPEEDV0, which has been
fixed in 14, but remains in stable/12 and stable/13.
Support the old, overlapping, call under COMPAT_FREEBSD13.

Reviewed by: jhb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33001


# 047c4e36 13-Nov-2021 Kristof Provost <kp@FreeBSD.org>

pf: renumber DIOCKEEPCOUNTERS

We accidentally had two ioctls use the same base number
(DIOCKEEPCOUNTERS and DIOCGIFSPEEDV{0,1}). We get away with that on most
platforms because the size of the argument structures is different.
This does break CHERI, and is generally a bad idea anyway.
Renumber to avoid this collision.

Reported by: jhb


# 76c5eecc 29-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: Introduce ridentifier

Allow users to set a number on rules which will be exposed as part of
the pflog header.
The intent behind this is to allow users to correlate rules across
updates (remember that pf rules continue to exist and match existing
states, even if they're removed from the active ruleset) and pflog.

Obtained from: pfSense
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32750


# 8f3d786c 01-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: remove the flags argument from pf_unlink_state

All consumers call it with PF_ENTER_LOCKED.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 76c2e71c 04-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: remove unused field from pf_kanchor

The 'match' field is only used in the userspace version of the struct
(pf_anchor).

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 5062afff 13-Aug-2021 Kristof Provost <kp@FreeBSD.org>

pfctl: userspace adaptive syncookies configration

Hook up the userspace bits to configure syncookies in adaptive mode.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D32136


# bf863718 24-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: implement adaptive mode

Use atomic counters to ensure that we correctly track the number of half
open states and syncookie responses in-flight.
This determines if we activate or deactivate syncookies in adaptive
mode.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D32134


# 63b3c1c7 15-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: support dummynet

Allow pf to use dummynet pipes and queues.

We re-use the currently unused IPFW_IS_DUMMYNET flag to allow dummynet
to tell us that a packet is being re-injected after being delayed. This
is needed to avoid endlessly looping the packet between pf and dummynet.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31904


# b64f7ce9 07-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: qid and pqid can be uint16_t

tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.

Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31873


# bb25e36e 07-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: remove unused function prototype

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 4cab80a8 29-Aug-2021 Kristof Provost <kp@FreeBSD.org>

pf: Add counters for syncookies

Count when we send a syncookie, receive a valid syncookie or detect a
synflood.

Reviewed by: kbowling
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31713


# 2b10cf85 16-Aug-2021 Kristof Provost <kp@FreeBSD.org>

pf: Introduce nvlist variant of DIOCGETSTATUS

Make it possible to extend the GETSTATUS call (e.g. when we want to add
new counters, such as for syncookie support) by introducing an
nvlist-based alternative.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31694


# b69019c1 06-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: remove DIOCGETSTATESNV

While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.

The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.

Reviewed by: mjg
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31099


# 02cf67cc 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch rule counters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d40d4b3e 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch kif counters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# fc4c42ce 23-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch pf_status.fcounters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# defdcdd5 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add hybrid 32- an 64- bit counters

Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.

Work around the problem by implementing a pf-specific variant.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d9cc6ea2 23-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: hide struct pf_kstatus behind ifdef _KERNEL

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 32271c4d 20-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: clean up syncookie callout on vnet shutdown

Ensure that we cancel any outstanding callouts for syncookies when we
terminate the vnet.

MFC after: 1 week
Sponsored by: Modirum MDPay


# 907257d6 19-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: embed a pointer to the lock in struct pf_kstate

This shaves calculation which in particular helps on arm.

Note using the & hack instead would still be more work.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 231e83d3 26-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: syncookie ioctl interface

Kernel side implementation to allow switching between on and off modes,
and allow this configuration to be retrieved.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31139


# 8e1864ed 20-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: syncookie support

Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by: kbowling
Obtained from: OpenBSD
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31138


# 9009d36a 19-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: shrink struct pf_kstate

Makes room for a pointer.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# f9aa757d 19-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add a comment to pf_kstate concerning compat with pf_state_cmp

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# ef950daa 02-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: match keyword support

Support the 'match' keyword.
Note that support is limited to adding queuing information, so without
ALTQ support in the kernel setting match rules is pointless.

For the avoidance of doubt: this is NOT full support for the match
keyword as found in OpenBSD's pf. That could potentially be built on top
of this, but this commit is NOT that.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31115


# c6bf20a2 05-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: add DIOCGETSTATESV2

Add a new version of the DIOCGETSTATES call, which extends the struct to
include the original interface information.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31097


# 19d6e29b 08-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_find_state_all_exists

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 211cddf9 06-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: rename pf_state to pf_kstate

Indicate that this is a kernel-only structure, and make it easier to
distinguish from others used to communicate with userspace.

Reviewed by: mjg
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31096


# f649cff5 05-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: padalign global locks found in pf.c

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# dc1ab04e 02-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: allow table stats clearing and reading with ruleset rlock

Instead serialize against these operations with a dedicated lock.

Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln. With
the change there is no slowdown.

Reviewed by: kp (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")


# f92c21a2 02-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: depessimize table handling

Creating tables and zeroing their counters induces excessive IPIs (14
per table), which in turns kills single- and multi-threaded performance.

Work around the problem by extending per-CPU counters with a general
counter populated on "zeroing" requests -- it stores the currently found
sum. Then requests to report the current value are the sum of per-CPU
counters subtracted by the saved value.

Sample timings when loading a config with 100k tables on a 104-way box:

stock:

pfctl -f tables100000.conf 0.39s user 69.37s system 99% cpu 1:09.76 total
pfctl -f tables100000.conf 0.40s user 68.14s system 99% cpu 1:08.54 total

patched:

pfctl -f tables100000.conf 0.35s user 6.41s system 99% cpu 6.771 total
pfctl -f tables100000.conf 0.48s user 6.47s system 99% cpu 6.949 total

Reviewed by: kp (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 55cc305d 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: revert: Use counter(9) for pf_state byte/packet tracking

stats are not shared and consequently per-CPU counters only waste
memory.

No slowdown was measured when passing over 20M pps.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 803dfe3d 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: deduplicate V_pf_state_z handling with pfsync

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# e6dd0e2e 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: assert that sizeof(struct pf_state) <= 312

To prevent accidentally going over a threshold which makes UMA fit only
12 objects per page instead of 13.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d09388d0 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_release_staten and use it in pf_unlink_state

Saves one atomic op.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d38630f6 04-Jun-2021 Kristof Provost <kp@FreeBSD.org>

pf: store L4 headers in pf_pdesc

Rather than pointers to the headers store full copies. This brings us
slightly closer to what OpenBSD does, and also makes more sense than
storing pointers to stack variable copies of the headers.

Reviewed by: donner, scottl
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30719


# ec7b47fc 31-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Move provider declaration to pf.h

This simplifies life a bit, by not requiring us to repease the
declaration for every file where we want static probe points.

It also makes the gcc6 build happy.


# d0fdf2b2 12-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Track the original kif for floating states

Track (and display) the interface that created a state, even if it's a
floating state (and thus uses virtual interface 'all').

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30245


# 0592a4c8 05-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Add DIOCGETSTATESNV

Add DIOCGETSTATESNV, an nvlist-based alternative to DIOCGETSTATES.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30243


# 1732afaa 05-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Add DIOCGETSTATENV

Add DIOCGETSTATENV, an nvlist-based alternative to DIOCGETSTATE.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30242


# 93abcf17 03-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Support killing 'matching' states

Optionally also kill states that match (i.e. are the NATed state or
opposite direction state entry for) the state we're killing.

See also https://redmine.pfsense.org/issues/8555

Submitted by: Steven Brown
Reviewed by: bcr (man page)
Obtained from: https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30092


# abbcba9c 30-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Allow states to by killed per 'gateway'

This allows us to kill states created from a rule with route-to/reply-to
set. This is particularly useful in multi-wan setups, where one of the
WAN links goes down.

Submitted by: Steven Brown
Obtained from: https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30058


# e989530a 29-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Introduce DIOCKILLSTATESNV

Introduce an nvlist based alternative to DIOCKILLSTATES.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30054


# 7606a45d 29-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Introduce DIOCCLRSTATESNV

Introduce an nvlist variant of DIOCCLRSTATES.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30052


# 6fcc8e04 20-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Allow multiple labels to be set on a rule

Allow up to 5 labels to be set on each rule.
This offers more flexibility in using labels. For example, it replaces
the customer 'schedule' keyword used by pfSense to terminate states
according to a schedule.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29936


# 42ec75f8 15-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Optionally attempt to preserve rule counter values across ruleset updates

Usually rule counters are reset to zero on every update of the ruleset.
With keepcounters set pf will attempt to find matching rules between old
and new rulesets and preserve the rule counters.

MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29780


# 4f1f67e8 15-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: PFRULE_REFS should not be user-visible

Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a
kernel-internal flag and should not be exposed to or read from
userspace.

MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29778


# 2aa21096 13-Apr-2021 Kurosawa Takahiro <takahiro.kurosawa@gmail.com>

pf: Implement the NAT source port selection of MAP-E Customer Edge

MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.

PR: 254577
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29468


# d710367d 25-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Implement nvlist variant of DIOCGETRULE

MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29559


# 5c62eded 11-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Introduce nvlist variant of DIOCADDRULE

This will make future extensions of the API much easier.
The intent is to remove support for DIOCADDRULE in FreeBSD 14.

Reviewed by: markj (previous version), glebius (previous version)
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29557


# 4967f672 08-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Remove unused variable rt_listid from struct pf_krule

Reviewed by: donner
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29639


# cecfaf9b 10-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Fully remove interrupt events on vnet cleanup

swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.

We can remove it manually with intr_event_destroy().

PR: 254171
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29211


# 5e9dae8e 10-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Factor out pf_krule_free()

Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29194


# 29698ed9 03-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Mark struct pf_pdesc as kernel only

This structure is only used by the kernel module internally. It's not
shared with user space, so hide it behind #ifdef _KERNEL.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# c86fa3b8 10-Jan-2021 Ryan Libby <rlibby@FreeBSD.org>

pf: quiet -Wredundant-decls for pf_get_ruleset_number

In e86bddea9fe62d5093a1942cf21950b3c5ca62e5 sys/netpfil/pf/pf.h grew a
declaration of pf_get_ruleset_number. Now delete the old declaration
from sys/net/pfvar.h.

Reviewed by: kp
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28081


# 5a3b9507 13-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Convert pfi_kkif to use counter_u64

Improve caching behaviour by using counter_u64 rather than variables
shared between cores.

The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:

x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
| + |
| xx + |
|xxx +++|
||A| |
| |A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962 9526356 9343902 9371057.6 116720.36
+ 5 19427190 19698400 19502922 19546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756
108.584% +/- 2.9359%
(Student's t, pooled s = 112967)

Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763


# 26c841e2 13-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Allocate and free pfi_kkif in separate functions

Factor out allocating and freeing pfi_kkif structures. This will be
useful when we change the counters to be counter_u64, so we don't have
to deal with that complexity in the multiple locations where we allocate
pfi_kkif structures.

No functional change.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27762


# 320c1116 12-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pfi_kif into a user and kernel space structure

No functional change.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27761


# c3adacda 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Change pf_krule counters to use counter_u64

This improves the cache behaviour of pf and results in improved
throughput.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27760


# c7bdafe2 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Remove unused fields from pf_krule

The u_* counters are used only to communicate with userspace, as
userspace cannot use counter_u64. As pf_krule is not passed to userspace
these fields are now obsolete.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27759


# e86bddea 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pf_rule into kernel and user space versions

No functional change intended.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27758


# dc865dae 03-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Migrate pf_rule and related structs to pf.h

As part of the split between user and kernel mode structures we're
moving all user space usable definitions into pf.h.

No functional change intended.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27757


# fbbf270e 13-Nov-2020 Kristof Provost <kp@FreeBSD.org>

pf: Use counter_u64 in pf_src_node

Reviewd by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27756


# 17ad7334 23-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pf_src_node into a kernel and userspace struct

Introduce a kernel version of struct pf_src_node (pf_ksrc_node).

This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.

Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27707


# 1c00efe9 23-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Use counter(9) for pf_state byte/packet tracking

This improves cache behaviour by not writing to the same variable from
multiple cores simultaneously.

pf_state is only used in the kernel, so can be safely modified.

Reviewed by: Lutz Donnerhacke, philip
MFC after: 1 week
Sponsed by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27661


# c3f69af0 20-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Fix unaligned checksum updates

The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).

Import the OpenBSD fix for this issue.

PR: 240416
Obtained from: OpenBSD
MFC after: 1 week
Reviewed by: tuexen (previous version)
Differential Revision: https://reviews.freebsd.org/D27696


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# c1be8399 15-May-2020 Mark Johnston <markj@FreeBSD.org>

pf: Add a new zone for per-table entry counters.

Right now we optionally allocate 8 counters per table entry, so in
addition to memory consumed by counters, we require 8 pointers worth of
space in each entry even when counters are not allocated (the default).

Instead, define a UMA zone that returns contiguous per-CPU counter
arrays for use in table entries. On amd64 this reduces sizeof(struct
pfr_kentry) from 216 to 160. The smaller size also results in better
slab efficiency, so memory usage for large tables is reduced by about
28%.

Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24843


# 59048686 15-Mar-2019 Kristof Provost <kp@FreeBSD.org>

pf :Use counter(9) in pf tables.

The counters of pf tables are updated outside the rule lock. That means state
updates might overwrite each other. Furthermore allocation and
freeing of counters happens outside the lock as well.

Use counter(9) for the counters, and always allocate the counter table
element, so that the race condition cannot happen any more.

PR: 230619
Submitted by: Kajetan Staszkiewicz <vegeta@tuxpowered.net>
Reviewed by: glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19558


# 8f2ac656 10-Feb-2019 Patrick Kelsey <pkelsey@FreeBSD.org>

Reduce the time it takes the kernel to install a new PF config containing a large number of queues

In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by: kp
MFC after: 1 week
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D19131


# fbbf436d 02-Nov-2018 Kristof Provost <kp@FreeBSD.org>

pfsync: Handle syncdev going away

If the syncdev is removed we no longer need to clean up the multicast
entry we've got set up for that device.

Pass the ifnet detach event through pf to pfsync, and remove our
multicast handle, and mark us as no longer having a syncdev.

Note that this callback is always installed, even if the pfsync
interface is disabled (and thus it's not a per-vnet callback pointer).

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17502


# 5f6cf24e 02-Nov-2018 Kristof Provost <kp@FreeBSD.org>

pfsync: Make pfsync callbacks per-vnet

The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17499


# 790194cd 02-Nov-2018 Kristof Provost <kp@FreeBSD.org>

pf: Limit the fragment entry queue length to 64 per bucket.

So we have a global limit of 1024 fragments, but it is fine grained to
the region of the packet. Smaller packets may have less fragments.
This costs another 16 bytes of memory per reassembly and devides the
worst case for searching by 8.

Obtained from: OpenBSD
Differential Revision: https://reviews.freebsd.org/D17734


# fd2ea405 02-Nov-2018 Kristof Provost <kp@FreeBSD.org>

pf: Split the fragment reassembly queue into smaller parts

Remember 16 entry points based on the fragment offset. Instead of
a worst case of 8196 list traversals we now check a maximum of 512
list entries or 16 array elements.

Obtained from: OpenBSD
Differential Revision: https://reviews.freebsd.org/D17733


# 249cc75f 22-Aug-2018 Patrick Kelsey <pkelsey@FreeBSD.org>

Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of
2^32 bps or greater to be used. Prior to this, bandwidth parameters
would simply wrap at the 2^32 boundary. The computations in the HFSC
scheduler and token bucket regulator have been modified to operate
correctly up to at least 100 Gbps. No other algorithms have been
examined or modified for correct operation above 2^32 bps (some may
have existing computation resolution or overflow issues at rates below
that threshold). pfctl(8) will now limit non-HFSC bandwidth
parameters to 2^32 - 1 before passing them to the kernel.

The extensions to the pf(4) ioctl interface have been made in a
backwards-compatible way by versioning affected data structures,
supporting all versions in the kernel, and implementing macros that
will cause existing code that consumes that interface to use version 0
without source modifications. If version 0 consumers of the interface
are used against a new kernel that has had bandwidth parameters of
2^32 or greater configured by updated tools, such bandwidth parameters
will be reported as 2^32 - 1 bps by those old consumers.

All in-tree consumers of the pf(4) interface have been updated. To
update out-of-tree consumers to the latest version of the interface,
define PFIOC_USE_LATEST ahead of any includes and use the code of
pfctl(8) as a guide for the ioctls of interest.

PR: 211730
Reviewed by: jmallett, kp, loos
MFC after: 2 weeks
Relnotes: yes
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D16782


# 91e0f2d2 05-Aug-2018 Kristof Provost <kp@FreeBSD.org>

pf: Increase default hash table size

Now that we (by default) limit the number of states to 100.000 it makse sense
to also adjust the default size of the hash table.

Based on the benchmarking results in
https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md
128K entries offers a good compromise between performance and memory use.

Users may still overrule this setting with the net.pf.states_hashsize and
net.pf.source_nodes_hashsize loader(8) tunables.


# b895839d 12-Jul-2018 Kristof Provost <kp@FreeBSD.org>

pf: Fix typo in r336221

Reported by: olivier@


# f4cdb8ae 12-Jul-2018 Kristof Provost <kp@FreeBSD.org>

pf: Increate default state table size

The typical system now has a lot more memory than when pf was new, and is also
expected to handle more connections. Increase the default size of the state
table.
Note that users can overrule this using 'set limit states' in pf.conf.

From OpenBSD:
The year is 2018.
Mercury, Bowie, Cash, Motorola and DEC all left us.
Just pf still has a default state table limit of 10000.
Had! Now it's a tiny little bit more, 100k.
lead guitar: me
ok chorus: phessler theo claudio benno
background school girl laughing: bob

Obtained from: OpenBSD


# cc535c95 03-Jul-2018 Will Andrews <will@FreeBSD.org>

Revert r335833.

Several third-parties use at least some of these ioctls. While it would be
better for regression testing if they were used in base (or at least in the
test suite), it's currently not worth the trouble to push through removal.

Submitted by: antoine, markj


# c1887e9f 30-Jun-2018 Will Andrews <will@FreeBSD.org>

pf: remove unused ioctls.

Several ioctls are unused in pf, in the sense that no base utility
references them. Additionally, a cursory review of pf-based ports
indicates they're not used elsewhere either. Some of them have been
unused since the original import. As far as I can tell, they're also
unused in OpenBSD. Finally, removing this code removes the need for
future pf work to take them into account.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D16076


# d25c25dc 29-May-2018 Kristof Provost <kp@FreeBSD.org>

pf: Add missing include statement

rmlocks require <sys/lock.h> as well as <sys/rmlock.h>.
Unbreak mips build.


# 455969d3 30-May-2018 Kristof Provost <kp@FreeBSD.org>

pf: Replace rwlock on PF_RULES_LOCK with rmlock

Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by: farrokhi@
Differential Revision: https://reviews.freebsd.org/D15502


# adfe2f6a 06-Apr-2018 Kristof Provost <kp@FreeBSD.org>

pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS

These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.

Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.

MFC after: 1 week


# effaab88 23-Mar-2018 Kristof Provost <kp@FreeBSD.org>

netpfil: Introduce PFIL_FWD flag

Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715


# bf56a3fe 25-Feb-2018 Kristof Provost <kp@FreeBSD.org>

pf: Cope with overly large net.pf.states_hashsize

If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR: 209475
Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14367


# 5d0020d6 31-Dec-2017 Kristof Provost <kp@FreeBSD.org>

pf: Clean all fragments on shutdown

When pf is unloaded, or a vnet jail using pf is stopped we need to
ensure we clean up all fragments, not just the expired ones.


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 2f8fb3a8 22-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix possible shutdown race

Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.

Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,

Pointed out by: eri, glebius, jhb
Differential Revision: https://reviews.freebsd.org/D10026


# a0429b54 23-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Update pf(4) and pflog(4) to survive basic VNET testing, which includes
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.

Reviewed by: kp
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6924


# 3e248e0f 17-Jun-2016 Kristof Provost <kp@FreeBSD.org>

pf: Filter on and set vlan PCP values

Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.

Reviewed by: allanjude, araujo
Approved by: re (gjb)
Obtained from: OpenBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D6786


# 8ec07310 01-Feb-2016 Gleb Smirnoff <glebius@FreeBSD.org>

These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h


# 26022843 25-Oct-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix compliation warning with gcc

While fixing the PF_ANEQ() macro I messed up the parentheses, leading to
compliation warnings with gcc.

Spotted by: ian
Pointy Hat: kp


# 7d762423 25-Oct-2015 Kristof Provost <kp@FreeBSD.org>

PF_ANEQ() macro will in most situations returns TRUE comparing two identical
IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't
stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32
bits (like for IPv6 packet). Those bits containt some garbage and in result
PF_ANEQ() wrongly returns TRUE.

Fix: Check if packet is of AF_INET type and if it is then compare only first 32
bits of data.

PR: 204005
Submitted by: Miłosz Kaniewski


# c110fc49 14-Oct-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix TSO issues

In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR: 154428, 193579, 198868
Reviewed by: sbruno
MFC after: 1 week
Relnotes: yes
Sponsored by: RootBSD
Differential Revision: https://reviews.freebsd.org/D3779


# 64b3b4d6 27-Aug-2015 Kristof Provost <kp@FreeBSD.org>

pf: Remove support for 'scrub fragment crop|drop-ovl'

The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by: gnn, eri
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D3466


# 6332e4cc 15-Apr-2015 George V. Neville-Neil <gnn@FreeBSD.org>

Minor change to the macros to make sure that if an AF is passed that is neither AF_INET6 nor AF_INET that we don't touch random bits of memory.

Differential Revision: https://reviews.freebsd.org/D2291


# 3e8c6d74 16-Mar-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Always lock the hash row of a source node when updating its 'states' counter.

PR: 182401
Sponsored by: Nginx, Inc.


# 0324938a 16-Feb-2015 Gleb Smirnoff <glebius@FreeBSD.org>

- Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.


# 8dc98c2a 16-Feb-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Toss declarations to fix regular build and NO_INET6 build.


# f0b0fe5b 16-Feb-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Commit a miss from r278843.

Pointy hat to: glebius


# 936bdf36 16-Feb-2015 Brad Davis <brd@FreeBSD.org>

Fix build.

Approved by: gibbs


# 60048052 15-Feb-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Missed from r278831.


# efc6c51f 21-Jan-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.


# 8d665c6b 06-Jan-2015 Craig Rodrigues <rodrigc@FreeBSD.org>

Reapply previous patch to fix build.

PR: 194515


# c75820c7 06-Jan-2015 Craig Rodrigues <rodrigc@FreeBSD.org>

Merge: r258322 from projects/pf branch

Split functions that initialize various pf parts into their
vimage parts and global parts.
Since global parts appeared to be only mutex initializations, just
abandon them and use MTX_SYSINIT() instead.
Kill my incorrect VNET_FOREACH() iterator and instead use correct
approach with VNET_SYSINIT().

PR: 194515
Differential Revision: D1309
Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: trociny, zec, gnn


# a9572d8f 14-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# 7e92ce73 29-Mar-2014 Martin Matuska <mm@FreeBSD.org>

De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

Reviewed by: Nikos Vassiliadis, trociny@


# fb3541ad 04-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Instead of playing games with casts simply add 3 more members to the
structure pf_rule, that are used when the structure is passed via
ioctl().

PR: 187074


# 48278b88 14-Feb-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to: Dennis Yusupoff <dyr smartspb.net>
Also reported by: dumbbell, pgj, Rambler
Sponsored by: Nginx, Inc.


# 07d9bc07 08-Feb-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Revert accidentially leaked changes in r261627.


# 603819bc 08-Feb-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove never set flag FL_OVERWRITE. The only place where
it was checked led to lock/critnest leak.


# d77c1b32 22-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
These function do not proceed with freeing of a node, just disconnect
it from storage.
- New function pf_free_src_nodes() works on a list of previously
disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by: InnoGames GmbH
Sponsored by: Nginx, Inc.


# 3260ae00 22-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Add missing 'extern'.


# f053058c 18-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

- Split functions that initialize various pf parts into their vimage
parts and global parts.
- Since global parts appeared to be only mutex initializations, just
abandon them and use MTX_SYSINIT() instead.
- Kill my incorrect VNET_FOREACH() iterator and instead use correct
approach with VNET_SYSINIT().

Submitted by: Nikos Vassiliadis <nvass gmx.com>
Reviewed by: trociny


# 75bf2db3 27-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with: bz


# 9dae57e1 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Start splitting pfvar.h into internal and external parts.

- Provide pf_altq.h that has only stuff needed for ALTQ.
- Start pf.h, that would have all constant values and
eventually non-kernel structures.
- Build ALTQ w/o pfvar.h, include if_var.h, that before
came via pollution.
- Build tcpdump w/o pfvar.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 6828cc99 19-Jun-2013 Gleb Smirnoff <glebius@FreeBSD.org>

De-vnet hash sizes and hash masks.

Submitted by: Nikos Vassiliadis <nvass gmx.com>
Reviewed by: trociny


# 22c91478 20-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Utilize Jenkins hash with random seed for source nodes storage.


# 7b115484 19-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Add missing break.

Pointy hat to: glebius


# 9ed8bbbd 17-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Fix build, pass the pointy hat please.


# 1d6139c0 18-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Make ruleset anchors in pf(4) reentrant. We've got two problems here:

1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
as small as possible, following measures taken:
- Maximum stack size reduced from 64 to 32.
- The struct pf_anchor_stackframe trimmed by one pointer - parent.
We can always obtain the parent via the rule pointer.
- When pf_test_rule() calls pf_get_translation(), the former lends
its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with: dhartmei


# 9e8c4acc 18-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

- Add $FreeBSD$ to allow modifications to this file.
- Move $OpenBSD$ to a more standard place.


# 3b3a8eb9 14-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi