History log of /freebsd-current/sys/net/if_bridge.c
Revision Date Author Comments
# 73585176 25-Apr-2024 Zhenlei Huang <zlei@FreeBSD.org>

if_bridge: Minor style fixes

And more comments on the #ifdef INET blocks to improve readability.

While here, revert the order of two prototypes to produce minimal diff
compared to stable branches.

MFC with: 65767e6126a7


# 65767e61 23-Apr-2024 Lexi Winter <lexi@le-Fay.ORG>

sys/net/if_bridge: support non-INET kernels

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1159


# ef84dd8f 21-Apr-2024 Lexi Winter <lexi@le-Fay.ORG>

if_bridge: clean up INET/INET6 handling

The if_bridge contains several instances of:

if (AF_INET code ...
#ifdef INET6
AF_INET6 code ...
#endif
) {
...

Clean this up by adding a couple of macros at the top of the file that
are conditionally defined based on whether INET and/or INET6 are enabled,
which makes the code more readable and easier to maintain.

No functional change intended.

Reviewed by: zlei, markj
MFC after: 1 week
Pull Request: https://github.com/freebsd/freebsd-src/pull/1191


# 319a5d08 31-Mar-2024 Eugene Grosbein <eugen@FreeBSD.org>

if_bridge: use IF_MINMTU

Replace incorrect constant 576 with IF_MINMTU to check for minumum MTU.
This unbreaks bridging tap interfaces with small mtu.

MFC after: 1 week


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# fd7edfcd 01-Jun-2023 Ben Wilber <ben@desync.com>

bridge: fix lookup for untagged packets in bridge_transmit()

b0e38a1373 improved if_bridge's ability to cope with different VLANs,
but it failed to update bridge_transmit() to cope with the new rule that
untagged packets are treated as having VLAN ID 0 (rather than 1, as used
to be the case).

Fix that oversight.

PR: 270559
Reviewed by: kp


# f3546eac 18-May-2023 Kristof Provost <kp@FreeBSD.org>

if_bridge: fix potential panic

When a new bridge_rtnode is added it is added with a NULL brt_dst. The
brt_dst is set after the entry is added. This means there's a small
window where another core could also attempt to add this node, leading
to the code attempting to log that the MAC addresses moved to a new
interface.
Aside from that being a spurious log entry it also panics, because
obif is NULL (and we attempt to dereference it).

Avoid this by settings brt_dst before we insert the bridge_rtnode.
Assert that obif is non-NULL, as an extra precaution.

Reported by: olivier@
Reviewed by: zlei@
Differential Revision: https://reviews.freebsd.org/D40147


# b0e38a13 07-Apr-2023 Kristof Provost <kp@FreeBSD.org>

bridge: distinguish no vlan and vlan 1

The bridge treated no vlan tag as being equivalent to vlan ID 1, which
causes confusion if the bridge sees both untagged and vlan 1 tagged
traffic.

Use DOT1Q_VID_NULL when there's no tag, and fix up the lookup code by
using 'DOT1Q_VID_RSVD_IMPL' to mean 'any vlan', rather than vlan 0. Note
that we have to account for userspace expecting to use 0 as meaning 'any
vlan'.

PR: 270559
Suggested by: Zhenlei Huang <zlei@FreeBSD.org>
Reviewed by: philip, zlei
Differential Revision: https://reviews.freebsd.org/D39478


# 9af6f426 14-Apr-2023 Zhenlei Huang <zlei@FreeBSD.org>

bridge: Use the %D identifier to format MAC address

It is shorter and more readable.

No functional change intended.

Reviewed by: kp
Fixes: 2d3614fb132b bridge: Log MAC address port flapping
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39542


# d862b165 10-Apr-2023 Mark Johnston <markj@FreeBSD.org>

bridge: Add support for emulated netmap mode

if_bridge receives packets via a special interface, if_bridge_input,
rather than by if_input. Thus, netmap's usual hooking of ifnet routines
does not work as expected. Instead, modify bridge_input() to pass
packets directly to netmap when it is enabled. This applies to both
locally delivered packets and forwarded packets.

When a netmap application transmits a packet by writing it to the host
TX ring, the mbuf chain is passed to if_input, which ordinarily points
to ether_input(). However, when transmitting via if_bridge,
bridge_input() needs to see the packet again in order to decide whether
to deliver or forward. Thus, introduce a new protocol flag,
M_BRIDGE_INJECT, which 1) causes the packet to be passed to
bridge_input() again after Ethernet processing, and 2) avoids passing
the packet back to netmap. The source MAC address of the packet is used
to determine the original "receiving" interface.

Reviewed by: vmaffione
MFC after: 2 months
Sponsored by: Zenarmor
Sponsored by: OPNsense
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38066


# 2d3614fb 07-Apr-2023 Zhenlei Huang <zlei@FreeBSD.org>

bridge: Log MAC address port flapping

MAC flapping occurs when a bridge receives packets with the same source MAC
address on different member interfaces. The common reasons are:
- user roams from one bridge port to another
- user has wrong network setup, bridge loops e.g.
- someone set duplicated ethernet address on his/her nic
- some bad guy / virus / trojan send spoofed packets

if_bridge currently updates the bridge routing entry silently hence it is hard
to diagnose.

Emit logs when MAC address port flapping occurs to make it easier to diagnose.

Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39375


# 82bbdde4 05-Apr-2023 Mark Johnston <markj@FreeBSD.org>

bridge: Try to make the GRAB_OUR_PACKETS macro a bit more readable

- Let the compiler use constant folding to eliminate conditionals.
- Fix some inconsistent whitespace.

No functional change intended.

Reviewed by: zlei
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38410


# 66bdbcd5 03-Mar-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

net: unify mtu update code

Subscribers: imp, ae, glebius

Differential Revision: https://reviews.freebsd.org/D38893


# a2256150 14-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

net: use pfil_mbuf_{in,out} where we always have an mbuf

This finalizes what has been started in 0b70e3e78b0.

Reviewed by: kp, mjg
Differential revision: https://reviews.freebsd.org/D37976


# 3bc099eb 07-Feb-2023 Mark Johnston <markj@FreeBSD.org>

bridge: Make the ioctl table local to if_bridge.c

No functional change intended.

MFC after: 1 week
Sponsored by: Klara, Inc.


# 2c2b37ad 13-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet/API: Move struct ifnet definition to a <net/if_private.h>

Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.

Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046


# 51088797 11-Dec-2022 Mark Johnston <markj@FreeBSD.org>

bridge: Fix a potential memory leak in bridge_enqueue()

A comment at the beginning of the function notes that we may be
transmitting multiple fragments as distinct packets. So, the function
loops over all fragments, transmitting each mbuf chain. If if_transmit
fails, we need to free all of the fragments, but m_freem() only frees an
mbuf chain - it doesn't follow m_nextpkt.

Change the error handler to free each untransmitted packet fragment, and
count each fragment as a separate error since we increment OPACKETS once
per fragment when transmission is successful.

Reviewed by: zlei, kp
MFC after: 1 week
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37635


# 22893e58 13-Oct-2022 Kristof Provost <kp@FreeBSD.org>

bridge: default to not filtering L3

Change the default for net.link.bridge.pfil_member and
net.link.bridge.pfil_bridge to zero.

That is, default to not calling layer 3 firewalls on the bridge or its
member interfaces.

With either of these enabled the bridge will, during L2 processing,
remove the Ethernet header from packets, feed them to L3 firewalls,
re-add the Ethernet header and send them out.

Not only does this interact very poorly with firewalls which defer
packets, or reassemble and refragment IPv6, it also causes considerable
confusion for users, because the firewall gets called in unexpected
ways.

For example, a bridge which contains a bhyve tap and the host's LAN
interface. We'd expect traffic between the LAN and bhyve VM to pass, no
matter what (layer 3) firewall rules are set on the host. That's not the
case as long as pfil_bridge or pfil_member are set.

Reviewed by: Zhenlei Huang
MFC: never
Differential Revision: https://reviews.freebsd.org/D37009


# 91ebcbe0 21-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

if_clone: migrate some consumers to the new KPI.

Convert most of the cloner customers who require custom params
to the new if_clone KPI.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36636
MFC after: 2 weeks


# 150486f6 29-Jul-2022 Zhenlei Huang <zlei.huang@gmail.com>

Introduce and use the NET_EPOCH_DRAIN_CALLBACKS() macro

Reviewed by: melifao, kp
Differential Revision: https://reviews.freebsd.org/D35968


# 1865ebfb 25-Jun-2022 Kristof Provost <kp@FreeBSD.org>

if_bridge: change MTU for new members

Rather than reject new bridge members because they have the wrong MTU
change it to match the bridge. If that fails, reject the new interface.

PR: 264883
Different Revision: https://reviews.freebsd.org/D35597


# f7faa4ad 04-Jun-2022 Gordon Bergling <gbe@FreeBSD.org>

if_bridge(4): Fix a typo in a source code comment

- s/accross/across/

MFC after: 3 days


# 36637dd1 19-Feb-2022 Kristof Provost <kp@FreeBSD.org>

bridge: Don't share broadcast packets

if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.

Use m_dup() to create independent mbufs instead.

Reported by: Richard Russo <toast@ruka.org>
Reviewed by: donner, afedorov
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D34319


# eb680a63 25-Aug-2021 Luiz Otavio O Souza <loos@FreeBSD.org>

if_bridge: add ALTQ support

Similar to the recent addition of ALTQ support to if_vlan.

Reviewed by: donner
Obtained from: pfsense
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31675


# 33306493 23-Jul-2021 Kristof Provost <kp@FreeBSD.org>

if_bridge: allow MTU changes

if_bridge used to only allow MTU changes if the new MTU matched that of
all member interfaces. This doesn't really make much sense, in that we
really shouldn't be allowed to change the MTU of bridge member in the
first place.

Instead we now change the MTU of all member interfaces. If one fails we
revert all interfaces back to the original MTU.

We do not address the issue where bridge member interface MTUs can be
changed here.

Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31288


# 38c09513 21-Feb-2021 Kristof Provost <kp@FreeBSD.org>

bridge: Remove members when assigned to a new vnet

When the bridge is moved to a different vnet we must remove all of its
member interfaces (and span interfaces), because we don't know if those
will be moved along with it. We don't want to hold references to
interfaces not in our vnet.

Reviewed by: donner@
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D28859


# 89fa9c34 21-Feb-2021 Kristof Provost <kp@FreeBSD.org>

bridge/stp: Ensure we enter NET_EPOCH whenever we can send traffic

Reviewed by: donner@
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D28858


# 4af1bd81 06-Oct-2020 Kristof Provost <kp@FreeBSD.org>

bridge: call member interface ioctl() without NET_EPOCH

We're not allowed to hold NET_EPOCH while sleeping, so when we call ioctl()
handlers for member interfaces we cannot be in NET_EPOCH. We still need some
protection of our CK_LISTs, so hold BRIDGE_LOCK instead.

That requires changing BRIDGE_LOCK into a sleepable lock, and separating the
BRIDGE_RT_LOCK, to protect bridge_rtnode lists. That lock is taken in the data
path (while in NET_EPOCH), so it cannot be a sleepable lock.

While here document the locking strategy.

MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D26418


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 93ed6ade 17-Jul-2020 Kristof Provost <kp@FreeBSD.org>

bridge: Don't sleep during epoch

While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12.
There's also no reason for it, as we can easily report the out of memory error
to the caller (i.e. userspace). All of these can already fail.

PR: 248046
MFC after: 3 days


# fffd27e5 26-Apr-2020 Kristof Provost <kp@FreeBSD.org>

bridge: epoch-ification

Run the bridge datapath under epoch, rather than under the
BRIDGE_LOCK().

We still take the BRIDGE_LOCK() whenever we insert or delete items in
the relevant lists, but we use epoch callbacks to free items so that
it's safe to iterate the lists without the BRIDGE_LOCK.

Tests on mercat5/6 shows this increases bridge throughput significantly,
from 3.7Mpps to 18.6Mpps.

Reviewed by: emaste, philip, melifaro
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24250


# fac24ad7 18-Apr-2020 Kristof Provost <kp@FreeBSD.org>

bridge: Simplify mac address generation

Unconditionally use ether_gen_addr() to generate bridge mac addresses. This
function is now less likely to generate duplicate mac addresses across jails.
The old hand rolled hostid based code adds no value.

Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D24432


# ae4b6259 17-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Unbreak build by reverting if_bridge part of r360047.

Pointy hat to: melifaro


# 67452942 17-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Finish r191148: replace rtentry with route in if_bridge if_output() callback.

Generic if_output() callback signature was modified to use struct route
instead of struct rtentry in r191148, back in 2009.

Quoting commit message:

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Fix bridge_output() to match this signature and update the remaining
comment in if_var.h.

Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24394


# dd00a42a 05-Apr-2020 Kristof Provost <kp@FreeBSD.org>

bridge: Change lists to CK_LIST as a peparation for epochification

Prepare the ground for a rework of the bridge locking approach. We will
use an epoch-based approach in the datapath and making it safe to
iterate over the interface, span and rtnode lists without holding the
BRIDGE_LOCK. Replace the relevant lists by their ConcurrencyKit
equivalents.

No functional change in this commit.

Reviewed by: emaste, ae, philip (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24249


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 33b1fe11 26-Feb-2020 Kristof Provost <kp@FreeBSD.org>

bridge: Move locking defines into if_bridge.c

The locking defines for if_bridge used to live in if_bridgevar.h, but
they're only ever used by the bridge implementation itself (in
if_bridge.c). Moving them into the .c file.

Reported by: philip, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23808


# 84becee1 22-Jan-2020 Alexander Motin <mav@FreeBSD.org>

Update route MTUs for bridge, lagg and vlan interfaces.

Those interfaces may implicitly change their MTU on addition of parent
interface in addition to normal SIOCSIFMTU ioctl path, where the route
MTUs are updated normally.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 8d5c56da 01-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

In r343631 error code for a packet blocked by a firewall was
changed from EACCES to EPERM. This change was not intentional,
so fix that. Return EACCESS if a firewall forbids sending.

Noticed by: ae


# d8b98543 28-May-2019 Kyle Evans <kevans@FreeBSD.org>

if_bridge(4): Complete bpf auditing of local traffic over the bridge

There were two remaining "gaps" in auditing local bridge traffic with
bpf(4):

Locally originated outbound traffic from a member interface is invisible to
the bridge's bpf(4) interface. Inbound traffic locally destined to a member
interface is invisible to the member's bpf(4) interface -- this traffic has
no chance after bridge_input to otherwise pass it over, and it wasn't
originally received on this interface.

I call these "gaps" because they don't affect conventional bridge setups.
Alas, being able to establish an audit trail of all locally destined traffic
for setups that can function like this is useful in some scenarios.

Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19757


# 3c3aa8c1 17-Apr-2019 Kyle Evans <kevans@FreeBSD.org>

net: adjust randomized address bits

Give devices that need a MAC a 16-bit allocation out of the FreeBSD
Foundation OUI range. Change the name ether_fakeaddr to ether_gen_addr now
that we're dealing real MAC addresses with a real OUI rather than random
locally-administered addresses.

Reviewed by: bz, rgrimes
Differential Revision: https://reviews.freebsd.org/D19587


# 93c9d319 27-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

if_bridge(4): ensure all traffic passing over the bridge is accounted for

Consider a bridge0 with em0 and em1 members. Traffic rx'd by em0 and
transmitted by bridge0 through em1 gets accounted for in IPACKETS/IBYTES
and bridge0 bpf -- assuming it's not unicast traffic destined for em1.
Unicast traffic destined for em1 traffic is not accounted for by any
mechanism, and isn't pushed through bridge0's bpf machinery as any other
packets that pass over the bridge do.

Fix this and simplify GRAB_OUR_PACKETS by bailing out early if it was rx'd
by the interface that it was addressed for. Everything else there is
relevant for any traffic that came in from one member that's being directed
at another member of the bridge.

Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19614


# 4920f9a3 15-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

if_bridge(4): Drop pointless rtflush

At this point, all routes should've already been dropped by removing all
members from the bridge. This condition is in-fact KASSERT'd in the line
immediately above where this nop flush was added.


# 6e6b93fe 15-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

Revert r345192: Too many trees in play for bridge(4) bits

An accidental appendage was committed that has not undergone review yet.


# 4b4b284d 15-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

if_bridge(4): Drop pointless rtflush

At this point, all routes should've already been dropped by removing all
members from the bridge. This condition is in-fact KASSERT'd in the line
immediately above where this nop flush was added.


# 43d3127c 15-Mar-2019 Kristof Provost <kp@FreeBSD.org>

bridge: Fix STP-related panic

After r345180 we need to have the appropriate vnet context set to delete an
rtnode in bridge_rtnode_destroy().
That's usually the case, but not when it's called by the STP code (through
bstp_notify_rtage()).

We have to set the vnet context in bridge_rtable_expire() just as we do in the
other STP callback bridge_state_change().

Reviewed by: kevans


# a87407ff 15-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

if_bridge(4): Fix module teardown

bridge_rtnode_zone still has outstanding allocations at the time of
destruction in the current model because all of the interface teardown
happens in a VNET_SYSUNINIT, -after- the MOD_UNLOAD has already been
processed. The SYSUNINIT triggers destruction of the interfaces, which then
attempts to free the memory from the zone that's already been destroyed, and
we hit a panic.

Solve this by virtualizing the uma_zone we allocate the rtnodes from to fix
the ordering. bridge_rtable_fini should also take care to flush any
remaining routes that weren't taken care of when dynamic routes were flushed
in bridge_stop.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D19578


# 6b7e0c1c 14-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

ether: centralize fake hwaddr generation

We currently have two places with identical fake hwaddr generation --
if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other
interfaces that may also need a fake addr.

Reviewed by: bryanv, kp, philip
Differential Revision: https://reviews.freebsd.org/D19573


# c3c93809 04-Mar-2019 Alexander Motin <mav@FreeBSD.org>

bridge: Fix spurious warnings about capabilities

Mask off the bits we don't care about when checking that capabilities
of the member interfaces have been disabled as intended.

Submitted by: Ryan Moeller <ryan@ixsystems.com>
Reviewed by: kristof, mav
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D18924


# b252313f 31-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

New pfil(9) KPI together with newborn pfil API and control utility.

The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision: https://reviews.freebsd.org/D18951


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# 5c30b378 10-May-2018 Matt Macy <mmacy@FreeBSD.org>

Allow different bridge types to coexist

if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by: sbruno@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15344


# 0437c8e3 11-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Remove support for FDDI networks.

Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017


# effaab88 23-Mar-2018 Kristof Provost <kp@FreeBSD.org>

netpfil: Introduce PFIL_FWD flag

Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# ed9de14d2 21-Sep-2017 Kristof Provost <kp@FreeBSD.org>

bridge: Set module version

This ensures that the loader will not load the module if it's also built in to
the kernel.

PR: 220860
Submitted by: Eugene Grosbein <eugen@freebsd.org>
Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com>


# ebe42881 29-Apr-2017 Alexander Motin <mav@FreeBSD.org>

Make if_bridge complain if it can't disable some capabilities.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# ab5cda71 25-Jan-2017 Kristof Provost <kp@FreeBSD.org>

bridge: Release the bridge lock when calling bridge_set_ifcap()

This calls ioctl() handlers for the different interfaces in the bridge.
These handlers expect to get called in an ioctl context where it's safe
for them to sleep. We may not sleep with the bridge lock held.

However, we still need to protect the interface list, to ensure it
doesn't get changed while we iterate over it.
Use BRIDGE_XLOCK(), which prevents bridge members from being removed.
Adding bridge members is safe, because it uses LIST_INSERT_HEAD().

This caused panics when adding xen interfaces to a bridge.

PR: 216304
Reviewed by: ae
MFC after: 1 week
Sponsored by: RootBSD
Differential Revision: https://reviews.freebsd.org/D9290


# 921e5f56 26-Oct-2016 Bryan Drewery <bdrewery@FreeBSD.org>

Remove excess CTLFLAG_VNET

Sponsored by: Dell EMC Isilon


# f18598a4 24-Sep-2016 Kristof Provost <kp@FreeBSD.org>

bridge: Fix fragment handling and memory leak

Fragmented UDP and ICMP packets were corrupted if a firewall with reassembling
feature (like pf'scrub) is enabled on the bridge. This patch fixes corrupted
packet problem and the panic (triggered easly with low RAM) as explain in PR
185633.

bridge_pfil and bridge_fragment relationship:

bridge_pfil() receive (IN direction) packets and sent it to the firewall The
firewall can be configured for reassembling fragmented packet (like pf'scrubing)
in one mbuf chain when bridge_pfil() need to send this reassembled packet to the
outgoing interface, it needs to re-fragment it by using bridge_fragment()
bridge_fragment() had to split this mbuf (using ip_fragment) first then
had to M_PREPEND each packet in the mbuf chain for adding Ethernet
header.

But M_PREPEND can sometime create a new mbuf on the begining of the mbuf chain,
then the "main" pointer of this mbuf chain should be updated and this case is
tottaly forgotten. The original bridge_fragment code (Revision 158140,
2006 April 29) came from OpenBSD, and the call to bridge_enqueue was
embedded. But on FreeBSD, bridge_enqueue() is done after bridge_fragment(),
then the original OpenBSD code can't work as-it of FreeBSD.

PR: 185633
Submitted by: Olivier Cochard-Labbé
Differential Revision: https://reviews.freebsd.org/D7780


# 84e63372 18-Jul-2016 Alexander Motin <mav@FreeBSD.org>

Negotiate/disable TXCSUM_IPV6 same as TXCSUM.


# 89856f7e 21-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747


# a4641f4e 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net*: minor spelling fixes.

No functional change.


# 155d72c4 15-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net* : for pointers replace 0 with NULL.

Mostly cosmetical, no functional change.

Found with devel/coccinelle.


# 581e6970 13-Jun-2015 Kristof Provost <kp@FreeBSD.org>

Fix panic when adding vtnet interfaces to a bridge

vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host). if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR: 200210
Differential Revision: https://reviews.freebsd.org/D2804
Reviewed by: philip (mentor)


# 1c27e6c3 11-May-2015 Hiroki Sato <hrs@FreeBSD.org>

Fix a panic when VIMAGE is enabled.

Spotted by: Nikos Vassiliadis


# 25792b11 14-Feb-2015 Hiroki Sato <hrs@FreeBSD.org>

Fix a panic when tearing down a vnet on a VIMAGE-enabled kernel.
There was a race that bridge_ifdetach() could be called via
ifnet_departure event handler after vnet_bridge_uninit().

PR: 195859
Reported by: Danilo Egea Gondolfo


# 833e8dc5 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove struct arpcom. It is unused by most interface types, that allocate
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 6df8a710 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


# c5127526 05-Oct-2014 Hiroki Sato <hrs@FreeBSD.org>

Virtualize if_bridge(4) cloner.


# 3751dddb 19-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically convert to if_inc_counter().


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# b245f96c 12-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.

This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# c3322cb9 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 9fcd8e9e 28-Jul-2013 Hiroki Sato <hrs@FreeBSD.org>

- Relax the restriction on the member interfaces with LLAs. Two or more
LLAs on the member interfaces are actually harmless when the parent
interface does not have a LLA.

- Add net.link.bridge.allow_llz_overlap. This is a knob to allow LLAs on
a bridge and the member interfaces at the same time. The default is 0.

Pointed out by: ume
MFC after: 3 days


# 6facd7a6 03-Jul-2013 Hiroki Sato <hrs@FreeBSD.org>

Fix a compiler warning.

MFC after: 1 week


# af805644 02-Jul-2013 Hiroki Sato <hrs@FreeBSD.org>

- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:

ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by: rpaulo [*]
MFC after: 1 week


# 9cb8d207 09-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.

MFC after: 1 week


# 83a3ff21 28-Mar-2013 Mark Johnston <markj@FreeBSD.org>

Ignore interface renames instead of removing the interface from the bridge
group.

Reviewed by: rstone
Approved by: rstone (co-mentor)
Sponsored by: Sandvine Incorporated
MFC after: 1 week


# 129004c5 10-Mar-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Reinitialize eh after pfil(9) processing.

PR: 176764
Submitted by: adri


# c7dada99 17-Dec-2012 Kevin Lo <kevlo@FreeBSD.org>

Fix typo in comment.

Reviewed by: thompsa


# eb1b1807 05-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 5ad95203 29-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Use more appropriate loop (do { } while()) when generating ethernet address
for bridge interface.
- If we found a collision we can break the loop - only one collision is
possible and one is exactly enough to need to renegerate.

Obtained from: WHEEL Systems
MFC after: 1 week


# 078468ed 26-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>


# da1fc67f 24-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Fix fallout from r240071. If destination interface lookup fails,
we should broadcast a packet, not try to deliver it to NULL.

Reported by: rpaulo


# 42a58907 16-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


# 9823d527 10-Oct-2012 Kevin Lo <kevlo@FreeBSD.org>

Revert previous commit...

Pointyhat to: kevlo (myself)


# a10cee30 09-Oct-2012 Kevin Lo <kevlo@FreeBSD.org>

Prefer NULL over 0 for pointers


# 21d172a3 06-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

A step in resolving mess with byte ordering for AF_INET. After this change:

- All packets in NETISR_IP queue are in net byte order.
- ip_input() is entered in net byte order and converts packet
to host byte order right _after_ processing pfil(9) hooks.
- ip_output() is entered in host byte order and converts packet
to net byte order right _before_ processing pfil(9) hooks.
- ip_fragment() accepts and emits packet in net byte order.
- ip_forward(), ip_mloopback() use host byte order (untouched actually).
- ip_fastforward() no longer modifies packet at all (except ip_ttl).
- Swapping of byte order there and back removed from the following modules:
pf(4), ipfw(4), enc(4), if_bridge(4).
- Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
- __FreeBSD_version bumped.
- pfil(9) manual page updated.

Reviewed by: ray, luigi, eri, melifaro
Tested by: glebius (LE), ray (BE)


# 3e92ee8a 04-Oct-2012 Andrew Thompson <thompsa@FreeBSD.org>

Remove the M_NOWAIT from bridge_rtable_init as it isn't needed. The function
return value is not even checked and could lead to a panic on a null sc_rthash.

MFC after: 2 weeks


# 80cd7c75 26-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

- In the bridge_enqueue() do success/error accounting for
each fragment, not only once.
- In the GRAB_OUR_PACKETS() macro do increase if_ibytes.


# 7d4317bd 04-Sep-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after: 3 weeks


# 3582a9f6 03-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Change bridge(4) to use if_transmit for forwarding packets to underlying
interfaces instead of queueing.

Tested by: ray


# bf7a35de 10-Jul-2012 Ed Maste <emaste@FreeBSD.org>

Simplify error case

Submitted by: thompsa@


# 683fa2b5 10-Jul-2012 Ed Maste <emaste@FreeBSD.org>

Plug potential mbuf leak when bridging fragments

If an error occurs when transmitting one mbuf in a chain of fragments,
free the subsequent fragments instead of leaking them.

Sponsored by: ADARA Networks


# 21151865 09-Jul-2012 Ed Maste <emaste@FreeBSD.org>

Restore error handling lost in r191603

This was missed in the change from IFQ_ENQUEUE to if_transmit.

Sponsored by: ADARA Networks


# 08e34823 11-Jun-2012 Andrew Thompson <thompsa@FreeBSD.org>

Fix a panic I introduced in r234487, the bridge softc pointer is set to null
early in the detach so rearrange things not to explode.

Reported by: David Roffiaen, Gustau Perez Querol
Tested by: David Roffiaen
MFC after: 3 days


# bdf942c3 03-May-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Revert r234834 per luigi@ request.

Cleaner solution (e.g. adding another header) should be done here.

Original log:
Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Requested by: luigi
Approved by: kib(mentor)


# 7bd5e9b1 30-Apr-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Approved by: ae(mentor)
MFC after: 2 weeks


# 7702d401 20-Apr-2012 Andrew Thompson <thompsa@FreeBSD.org>

Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.

Prodded by: glebius
Tested by: Alexander Lunev


# 70b23a45 29-Feb-2012 Andrew Thompson <thompsa@FreeBSD.org>

Use a more appropriate default for the maximum number of addresses in the
bridge forwarding table.

PR: docs/164564
Discussed with: brueffer


# 4661f862 22-Feb-2012 Andrew Thompson <thompsa@FreeBSD.org>

bstp_input() always consumes the packet so remove the mbuf handling dance
around it.

Obtained from: OpenBSD (r1.37)


# 50c8ec53 07-Feb-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to set if_bridge(4) sysctls from /boot/loader.conf.

MFC after: 3 days


# 4b22573a 11-Nov-2011 Brooks Davis <brooks@FreeBSD.org>

In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after: 1 Month


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 0fe082e7 04-Sep-2011 Andrew Thompson <thompsa@FreeBSD.org>

On the first loop for generating a bridge MAC address use the local
hostid, this gives a good chance of keeping the same address over
reboots. This is intended to help IPV6 and similar which generate
their addresses from the mac.

PR: kern/160300
Submitted by: mdodd
Approved by: re (kib)


# 3d07127c 27-Aug-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

When adding IPv6 fwd support to ipfw in r225044 these two files were
not committed. Initialize next_hop6 to align with the IPv4 code.

PR: bin/117214
MFC after: 3 weeks
X-MFC with: r225044
Approved by: re (kib)


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 9963e8a5 11-Aug-2010 Will Andrews <will@FreeBSD.org>

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


# 54bfbd51 10-Aug-2010 Will Andrews <will@FreeBSD.org>

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# 7c61d493 24-May-2010 Andrew Thompson <thompsa@FreeBSD.org>

MFC r202588

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev

MFC r202611

Do not hold the lock over if_setlladdr() as it calls into the interface driver
init routine.


# 8018e843 23-Mar-2010 Luigi Rizzo <luigi@FreeBSD.org>

MFC of a large number of ipfw and dummynet fixes and enhancements
done in CURRENT over the last 4 months.
HEAD and RELENG_8 are almost in sync now for ipfw, dummynet
the pfil hooks and related components.

Among the most noticeable changes:
- r200855 more efficient lookup of skipto rules, and remove O(N)
blocks from critical sections in the kernel;
- r204591 large restructuring of the dummynet module, with support
for multiple scheduling algorithms (4 available so far)
See the original commit logs for details.

Changes in the kernel/userland ABI should be harmless because the
kernel is able to understand previous requests from RELENG_8 and
RELENG_7. For this reason, this changeset would be applicable
to RELENG_7 as well, but i am not sure if it is worthwhile.


# 7fe69750 22-Mar-2010 Hiroki Sato <hrs@FreeBSD.org>

MFC r203272:

- Fix a bug when adding an interface with an invalid MTU sets the
bridge's MTU if it is the firstly-added one while the addition
itself fails.

- Allow SIOCSIFMTU only when all members have the same MTU.

- Remove IFT_GIF check when defining the brige MTU by the
firstly-added interface's one. The MTU of the gif interface
has to be the same as the bridge's one.


# cc4d3c30 02-Mar-2010 Luigi Rizzo <luigi@FreeBSD.org>

Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.


# 2ae7ec29 07-Feb-2010 Julian Elischer <julian@FreeBSD.org>

MFC of 197952 and 198075

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.
and
Unbreak the VIMAGE build with IPSEC, broken with r197952 by
virtualizing the pfil hooks.
For consistency add the V_ to virtualize the pfil hooks in here as well.


# c2a5f1a5 31-Jan-2010 Hiroki Sato <hrs@FreeBSD.org>

- Check if_type of "addm <interface>" before setting the
interface's MTU to the if_bridge(4) interface. This fixes a
bug that MTU value of "addm <interface>" is used even when it
is invalid for the if_bridge(4) member:

# ifconfig bridge0 create
# ifconfig bridge0
bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
...
# ifconfig bridge0 addm lo0
ifconfig: BRDGADD lo0: Invalid argument
# ifconfig bridge0
bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 16384
...

- Do not ignore MTU value of an interface even when if_type == IFT_GIF.
This fixes MTU mismatch when an if_bridge(4) interface has a
gif(4) interface and no other interface as the member, and it
is directly used for L2 communication with EtherIP tunneling
enabled.

- Implement SIOCSIFMTU ioctl. Changing the MTU is allowed only
when all members have the same MTU value.


# ea4ca115 18-Jan-2010 Andrew Thompson <thompsa@FreeBSD.org>

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


# 7173b6e5 04-Jan-2010 Luigi Rizzo <luigi@FreeBSD.org>

Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
the firewall in the middle of a rulechain. On reentry, all tags
containing reinject info are renamed to MTAG_IPFW_RULE so the
processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
everywhere. Conversion is done only once instead of tracking
the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).


# 830c6e2b 28-Dec-2009 Luigi Rizzo <luigi@FreeBSD.org>

bring in several cleanups tested in ipfw3-head branch, namely:

r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
and netgraph;

r201049
- merge some common code to attach/detach hooks into
a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
and output processing uses almost exactly the same code so
there is no need to use two separate hooks.
ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
localize variables so the compiler does not think they are uninitialized,
do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
this is what the radix code expects (this fixes a bug in the
recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after: 1 week


# de240d10 22-Dec-2009 Luigi Rizzo <luigi@FreeBSD.org>

merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.

2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).

3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).

4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after: 1 month


# 70228fb3 15-Dec-2009 Luigi Rizzo <luigi@FreeBSD.org>

Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
now references the two new source files

netinet/ip_fw.h
remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
code related to dynamic rules

netinet/ipfw/ip_fw2.c
removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
minor rearrangement to remove LOOKUP_NAT from the
main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after: 1 month


# 0b4b0b0f 10-Oct-2009 Julian Elischer <julian@FreeBSD.org>

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months


# 6a89c3ed 08-Sep-2009 Jack F Vogel <jfv@FreeBSD.org>

Make LRO turned off uncategorically for devices
attached to the bridge, rather than just in the case
when some device cannot do TSO. Customer tests have
shown that even when all devices can do TSO that LRO
will cause problems when bridging.

Approved by: re


# 3de029ef 24-Aug-2009 Jack F Vogel <jfv@FreeBSD.org>

When bridging LRO is causing a problem, the believe
that it would work as long as all interfaces have TSO
seems to be false, until the matter gets sorted out
just disable LRO completely.


# 315e3e38 02-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.

Reviewed by: bz
Approved by: re (kib)


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 259d2d54 11-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

carp(4) allows people to share a set of IP addresses and can only
use IPv4/v6 for inter-node communication (according to my reading).

Properly wrap the carp callouts in INET || INET6 and refelect this
in sys/conf/files as well. While in theory this should be ok,
it might be a bit optimistic to think that carp could build with
inet6 only[1].

Discussed with: mlaier [1]


# dda10d62 09-Jun-2009 Oleg Bulyzhin <oleg@FreeBSD.org>

Close long existed race with net.inet.ip.fw.one_pass = 0:
If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc)
it carries pointer to matching ipfw rule. If this packet then reinjected back
to ipfw, ruleset processing starts from that rule. If rule was deleted
meanwhile, due to existed race condition panic was possible (as well as
other odd effects like parsing rules in 'reap list').

P.S. this commit changes ABI so userland ipfw related binaries should be
recompiled.

MFC after: 1 month
Tested by: Mikolaj Golub


# 115a40c7 05-Jun-2009 Luigi Rizzo <luigi@FreeBSD.org>

More cleanup in preparation of ipfw relocation (no actual code change):

+ move ipfw and dummynet hooks declarations to raw_ip.c (definitions
in ip_var.h) same as for most other global variables.
This removes some dependencies from ip_input.c;

+ remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly;

+ remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly;

+ move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it;

To be merged together with rev 193497

MFC after: 5 days


# 3f11aba7 01-May-2009 Andrew Thompson <thompsa@FreeBSD.org>

Reorder the bridge add and delete routines to avoid calling ifpromisc() with
the bridge lock held.


# 5d322040 27-Apr-2009 Sam Leffler <sam@FreeBSD.org>

use if_transmit intead of direct frobbing of the if_snd q; this is no
longer allowed

Identified by: rwatson
Reviewed by: kmacy


# 86425c62 11-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel. This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after: 3 days


# e5adda3d 15-Mar-2009 Robert Watson <rwatson@FreeBSD.org>

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


# 66c84010 13-Feb-2009 Andrew Thompson <thompsa@FreeBSD.org>

bridge_delete_member is called via the event handler from if_detach
after the LLADDR is reclaimed which causes a null pointer deref with
inherit_mac enabled. Record the ifnet pointer of the interface and then compare
that to find when to re-assign the bridge address.

Submitted by: sam


# 385195c0 10-Dec-2008 Marko Zec <zec@FreeBSD.org>

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 6945d73b 07-Sep-2008 Andrew Thompson <thompsa@FreeBSD.org>

Put the bridge mac inheritance behind a sysctl with the default off as this
still needs all the edge cases fixed.

Submitted by: Eygene Ryabinkin


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# ef57ba98 16-Aug-2008 Andrew Thompson <thompsa@FreeBSD.org>

LRO combined packets can actually be bridged as long as all the interfaces also
support TSO, this can always be disabled manually if undesirable.

Pointed out by: gallatin


# ec29c623 03-Jul-2008 Andrew Thompson <thompsa@FreeBSD.org>

Be smarter about disabling interface capabilities. TOE/TSO/TXCSUM will only be
disabled if one (or more) of the member interfaces does not support it. Always
turn off LRO since we can not bridge a combined frame.

Tested by: Stefan Lambrev


# fe878019 01-Jul-2008 Philip Paeps <philip@FreeBSD.org>

Set bridge MAC addresses to the MAC address of their first interface unless
locally configured. This is more in line with the behaviour of other popular
bridging implementations and makes bridges more predictable after reboots for
example.

Reviewed by: thompsa
MFC after: 1 week


# fdf229b1 18-Jan-2008 Andrew Thompson <thompsa@FreeBSD.org>

Remove a chunk of duplicated code, test the destination address against the
bridge the same way we check member interfaces.


# 905925d3 17-Jan-2008 Andrew Thompson <thompsa@FreeBSD.org>

IEEE 802.1D-2004 states, frames containing any of the group MAC Addresses
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.

PR: kern/119744


# eaf56834 17-Jan-2008 Andrew Thompson <thompsa@FreeBSD.org>

Sync from OpenBSD r1.118, nuke clause 3 & 4.


# 8411d52a 18-Dec-2007 Andrew Thompson <thompsa@FreeBSD.org>

Simplify the error handling and use the dereferenced sc->sc_ifp pointer.


# 155f68d1 18-Dec-2007 Andrew Thompson <thompsa@FreeBSD.org>

When the bridge has an address and a packet comes in for it then drop it if the
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.

Reported by: trasz
Tested by: trasz
MFC after: 3 days


# 897c0f57 06-Nov-2007 Oleg Bulyzhin <oleg@FreeBSD.org>

1) dummynet_io() declaration has changed.
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:

- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost

3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet

P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
is not changed yet.

MFC after: 3 month


# 5f33ec7b 04-Nov-2007 Andrew Thompson <thompsa@FreeBSD.org>

Add an option to limit the number of source MACs that can be behind a bridge
interface. Once the limit is reached packets with unknown source addresses are
dropped until an existing host cache entry expires or is removed. Useful to
use with the STICKY cache option.

Sponsored by: miniSuperHappyDevHouse NZ


# 3565f9bc 19-Oct-2007 Andrew Thompson <thompsa@FreeBSD.org>

Use ETHER_BPF_MTAP so that the vlan tags are visible to bpf(4) when bridging a
vlan trunk.

Discussed with: csjp
MFC after: 3 days


# 60e87ca8 18-Oct-2007 Andrew Thompson <thompsa@FreeBSD.org>

The bridging output function puts the mbuf directly on the interfaces send
queue so the output network card must support the same tagging mechanism as
how the frame was input (prepended Ethernet header tag or stripped HW mflag).

Now the vlan Ethernet header is _always_ stripped in ether_input and the mbuf
flagged, only only network cards with VLAN_HWTAGGING enabled would properly
re-tag any outgoing vlan frames.

If the outgoing interface does not support hardware tagging then readd the vlan
header to the front of the frame. Move the common vlan encapsulation in to
ether_vlanencap().

Reported by: Erik Osterholm, Jon Otterholm
MFC after: 1 week


# 31e4cb54 16-Sep-2007 Andrew Thompson <thompsa@FreeBSD.org>

Allow additional packet filtering on the physical interface for locally
destined packets, disabled by default.

PR: kern/116051
Submitted by: Eygene Ryabinkin
Approved by: re (bmah)
MFC after: 2 weeks


# 85ce7297 31-Jul-2007 Andrew Thompson <thompsa@FreeBSD.org>

Add a bridge interface flag called PRIVATE where any private port can not
communicate with another private port.

All unicast/broadcast/multicast layer2 traffic is blocked so it works much the
same way as using firewall rules but scales better and is generally easier as
firewall packages usually do not allow ARP blocking.

An example usage would be having a number of customers on separate vlans
bridged with a server network. All the vlans are marked private, they can all
communicate with the server network unhindered, but can not exchange any
traffic whatsoever with each other.

Approved by: re (rwatson)


# 82056f42 26-Jul-2007 Andrew Thompson <thompsa@FreeBSD.org>

Avoid holding the softc lock when using copyout().

Reported by: dfr
Approved by: re (rwatson)


# 22dcc3c1 13-Jun-2007 Andrew Thompson <thompsa@FreeBSD.org>

Add the vlan tag to the bridge route table. This allows a vlan trunk to be
bridged, previously legitimate traffic was not passed as the bridge could not
tell that it was on a different Ethernet segment.

All non-tagged traffic is treated as vlan1 as per IEEE 802.1Q-2003


# 5adfb0cc 30-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

Remove a KASSERT intended to help the developer, the condition is no longer
valid since the span code was added.

PR: kern/113170
MFC after: 1 week


# 6c655efc 19-Mar-2007 Andrew Thompson <thompsa@FreeBSD.org>

etherbroadcastaddr is now unused.


# 82912c1f 19-Mar-2007 Andrew Thompson <thompsa@FreeBSD.org>

M_BCAST & M_MCAST are now set by ether_input before passing to the bridge.


# 3d0a65c8 18-Mar-2007 Roman Kurakin <rik@FreeBSD.org>

Give a chance for packet to appear with a correct input interfaces
in case of multiple interfaces with the same MAC in the same bridge.
This commit do not solve the entire problem. Only case where packet
arrived from such interface.

PR: kern/109815
MFC after: 7 days
Submitted by: Eygene Ryabinkin and rik@
Discussed with: bms@, thompsa@, yar@


# 8bc736d0 14-Mar-2007 Andrew Thompson <thompsa@FreeBSD.org>

Properly move the setting of bstp_linkstate_p to the bridgestp module.


# e5bda9fb 09-Mar-2007 Andrew Thompson <thompsa@FreeBSD.org>

Change the passing of callbacks to a struct in case this needs to be extended in the future.


# 9c68675b 23-Feb-2007 Andrew Thompson <thompsa@FreeBSD.org>

Move the lock init until after if_alloc in case the allocation fails and we
free the softc and return.

MFC after: 3 days


# 78709605 11-Dec-2006 Andrew Thompson <thompsa@FreeBSD.org>

These days P2P means peer-2-peer (also well known from serveral filesharing
protocols) while PointToPoint has been PtP links. Change the variables
accordingly while the code is still fresh and undocumented.

Requested by: bz


# daacddca 04-Dec-2006 Shteryana Shopova <syrinx@FreeBSD.org>

Add two new flags to if_bridge(4) indicating whether the edge flag
of the bridge port and path cost have been administratively set or
calculated automatically by RSTP.

Make sure to transition from non-edge to edge when the port goes down
and the edge flag was manually set before.
This is needed to comply with the condition
((!portEnabled && AdminEdge) || ....)
in the Bridge Detection State Machine (IEE802.1D-2004, p. 171).

Reviewed by: thompsa
Approved by: bz (mentor)


# b8f45801 03-Dec-2006 Shteryana Shopova <syrinx@FreeBSD.org>

Fix SIOCGDRVSPEC/BRDGGIFSSTP ioctl: make it copyin() the user
provided buffer length before trying to use it.

Reviewed by: thompsa
Approved by: bz (mentor)
MFC after: 3 days


# 6c32e05c 26-Nov-2006 Andrew Thompson <thompsa@FreeBSD.org>

Sync with the OpenBSD port of RSTP
- use flags rather than sperate ioctls for edge, p2p
- implement p2p and autop2p flags
- define large pathcost constant as ULL
- show bridgeid and rootid in ifconfig

Obtained from: Reyk Floeter <reyk@openbsd.org>


# 071fff62 26-Nov-2006 Andrew Thompson <thompsa@FreeBSD.org>

use two stage creation of stp ports, this means that the stp variables can be
set before the port is marked STP and they will no longer be overwrittten


# 3df7fad0 08-Nov-2006 Andrew Thompson <thompsa@FreeBSD.org>

Add a new address cache type called sticky. On an interface marked sticky any
address learned by the bridge is made permanent, the address will not age out
and most importantly will not migrate to another interface.

This can be used to stop mac address poisoning or clients roaming in much the
same way as static entries without the hassle of preloading the table.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 67be76c0 05-Nov-2006 Christian S.J. Peron <csjp@FreeBSD.org>

Fix possible leak when bridge is in monitor mode. Use m_freem() which will
free the entire chain, instead of using m_free() which will free just the
mbuf that was passed.

Discussed with: thompsa
MFC after: 3 days


# 59ee2183 04-Nov-2006 Andrew Thompson <thompsa@FreeBSD.org>

When the packet is for the bridge then note which interface to send the reply
to, previously it was always broadcast to all interfaces (a bug). This is
useful when the bridge is the default gateway and vlans are used to isolate
each client, the reply is now kept private to the vlan which the client
resides.

Reported by: Jon Otterholm
Tested by: Jon Otterholm
MFC after: 3 days


# 3fab7669 01-Nov-2006 Andrew Thompson <thompsa@FreeBSD.org>

Bring in support for the Rapid Spanning Tree Protocol (802.1w).

RSTP provides faster spanning tree convergence, the protocol will exchange
information with neighboring switches to quickly transition to forwarding
without creating loops. The code will default to RSTP mode but will downgrade
any port connected to a legacy STP network so is fully backward compatible.

Reviewed by: syrinx
Tested by: syrinx


# 8408ecd6 08-Oct-2006 Andrew Thompson <thompsa@FreeBSD.org>

Use LIST_FOREACH_SAFE instead of a hand rolled version.


# 0a6f8a50 22-Sep-2006 Andrew Thompson <thompsa@FreeBSD.org>

Revert r1.80 as the ethernet header was inadvertently stripped from ARP
packets. Reimplement this correctly and use a sysctl that defaults to off so
the user doesnt get any suprises if ipfw blocks the ARP packet.

MFC after: 3 days


# 781dd9ae 17-Sep-2006 Andrew Thompson <thompsa@FreeBSD.org>

Rearrange things so that ARP packets can be filtered or rate limited with IPFW.

Requested by: Jon Otterholm
Tested by: Jon Otterholm


# 4ec528c7 25-Aug-2006 Andrew Thompson <thompsa@FreeBSD.org>

The bridge cant hear its own transmissions so set IFF_SIMPLEX.

PR: kern/102361
Tested by: Radim Kolar <hsn@netmag.cz>
MFC after: 3 days


# 705e3bd6 17-Aug-2006 Andrew Thompson <thompsa@FreeBSD.org>

Remove unneeded asserts from bridge_ioctl_* since these are just
extensions of bridge_ioctl() which has the correct locking.


# ff2cdcff 17-Aug-2006 Andrew Thompson <thompsa@FreeBSD.org>

Remove two lock asserts that are unneeded due to subsequent unlocks.


# b34b8d67 17-Aug-2006 Andrew Thompson <thompsa@FreeBSD.org>

Call bridge_span before dropping the lock.

MFC after: 5 days


# 73d480ae 01-Aug-2006 Andrew Thompson <thompsa@FreeBSD.org>

- Use the new bridgestp callback to once again flush our bridge routes when an
interface is disabled.
- Log port changes to syslog, defaulting to off


# fc5b6202 01-Aug-2006 Andrew Thompson <thompsa@FreeBSD.org>

Tell bridgestp that we are about to free the memory so it can cleanup.


# 51383c37 31-Jul-2006 Andrew Thompson <thompsa@FreeBSD.org>

Add some statistics that are needed to support RFC4188 as part of the SoC2006
work on a bridge monitoring module for BSNMP.

Submitted by: shteryana (SoC 2006)


# 9674cf0e 27-Jul-2006 Andrew Thompson <thompsa@FreeBSD.org>

Remove the dependency of bridgestp.h on if_bridgevar.h by moving a couple of
private structures to if_bridge.c.


# a4eb85b6 26-Jul-2006 Andrew Thompson <thompsa@FreeBSD.org>

bridgestp is now a seperate module.


# 7d4a207c 26-Jul-2006 Andrew Thompson <thompsa@FreeBSD.org>

Remove stp variables that are already initialised in bstp_attach().


# 96e47153 26-Jul-2006 Andrew Thompson <thompsa@FreeBSD.org>

/tmp/cvsuusTrc


# e61a82f3 26-Jul-2006 Andrew Thompson <thompsa@FreeBSD.org>

Remove variables that are overridden by ether_ifattach(). This clears up any
confusion especially as *if_output was pointed to a different function.


# 6b7330e2 09-Jul-2006 Sam Leffler <sam@FreeBSD.org>

Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by: arch@


# 690d7938 20-Jun-2006 Andrew Thompson <thompsa@FreeBSD.org>

Allow gif interfaces to be added as span ports, the user may want to send a
copy of all packets to the other side of the world.


# 615fccc5 18-Jun-2006 Andrew Thompson <thompsa@FreeBSD.org>

Fix spelling mistake in comment.


# 80829fcc 12-Jun-2006 Andrew Thompson <thompsa@FreeBSD.org>

Use bit operations to get a locally administered address rather than using a
hardcoded OUI code.


# b3a1f937 08-Jun-2006 Andrew Thompson <thompsa@FreeBSD.org>

Allow bridge and carp to play nicely together by returning the packet if its
destined for a carp interface.

Obtained from: OpenBSD
MFC after: 2 weeks


# dc1b1b7b 16-May-2006 Andrew Thompson <thompsa@FreeBSD.org>

Fix style(9) nits, whitespace and parentheses.


# 2557a639 15-May-2006 Daniel Hartmeier <dhartmei@FreeBSD.org>

Recalculate IP checksum after running pfil hooks.

Reviewed by: thompsa
Tested by: Adam McDougall <mcdouga9@egr.msu.edu>


# 7f87a57c 28-Apr-2006 Andrew Thompson <thompsa@FreeBSD.org>

Add support for fragmenting ipv4 packets.

The packet filter may reassemble the ip fragments and return a packet that is
larger than the MTU of the sending interface. There is no check for DF or icmp
replies as we can only get a large packet to fragment by reassembling a
previous fragment, and this only happens after a call to pfil(9).

Obtained from: OpenBSD (mostly)
Glanced at by: mlaier
MFC after: 1 month


# 64cb8505 26-Mar-2006 Andrew Thompson <thompsa@FreeBSD.org>

Assert that the mbuf is not shared to ensure problems like the last commit are
not reintroduced.


# 5cb7f13a 23-Mar-2006 Roman Kurakin <rik@FreeBSD.org>

m_dup () packet not m_copypacket () since we will modify it. For more
details see PR kern/94448.

PR: kern/94448

Original patch: Eygene A. Ryabinkin <rea-fbsd at rea dot mbslab dot kiae dot ru>Final patch: thompsa@
Tested by: thompsa@, Eygene A. Ryabinkin

MFC after: 7 days


# 158a726c 03-Mar-2006 Andrew Thompson <thompsa@FreeBSD.org>

Since we are using random ethernet addresses for the bridge, it is possible
that we might have address collisions, so make sure that this hardware address
isn't already in use on another bridge.

Submitted by: csjp
MFC after: 1 month


# 6f75ef18 02-Mar-2006 Christian S.J. Peron <csjp@FreeBSD.org>

Slightly re-worked bpf(4) code associated with bridging: if we have a
destination interface as a member of our bridge or this is a unicast packet,
push it through the bpf(4) machinery.

For broadcast or multicast packets, don't bother with the bpf(4) because it will
be re-injected into ether_input. We do this before we pass the packets through
the pfil(9) framework, as it is possible that pfil(9) will drop the packet or
possibly modify it, making it very difficult to debug firewall issues on the
bridge.

Further, implemented IFF_MONITOR for bridge interfaces. This does much the same
thing that it does for regular network interfaces: it pushes the packet to any
bpf(4) peers and then returns. This bypasses all of the bridge machinery,
saving mutex acquisitions, list traversals, and other operations performed by
the bridging code.

This change to the bridging code is useful in situations where individuals use a
bridge to multiplex RX/TX signals from two interfaces, as is required by some
network taps for de-multiplexing links and transmitting the RX/TX signals
out through two separate interfaces. This behaviour is quite common for network
taps monitoring links, especially for certain manufacturers.

Reviewed by: thompsa
MFC after: 1 month
Sponsored by: Seccuris Labs


# 3ecf1851 03-Feb-2006 Oleg Bulyzhin <oleg@FreeBSD.org>

Properly initialize args structure before passing it to ipfw_chk(): having
uninitialized args.inp is unhealthy for uid/gid/jail ipfw rules.

PR: kern/92589
Approved by: glebius (mentor)
MFC after: 1 week


# f5cdbcf1 02-Feb-2006 Christian S.J. Peron <csjp@FreeBSD.org>

Use PFIL_HOOKED macros in if_bridge and pass the right argument to
rw_assert. This un-breaks the build.

Submitted by: Kostik Belousov
Pointy hat to: csjp


# 6637e0f3 31-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Fix two bugs with the bridge

- code expects memcmp() to return a signed value, our memcmp() returns 0 if
args are equal and > 0 if not.

- It's possible to hijack interface for static entry. If bridge recieves
packet from interface marked as learning it will replace the bridge_rtnode
entry for the source address even if such entry marked as static.

Submitted by: Gleb Kurtsov <k-gleb yandex.ru>
MFC after: 3 days


# 02d4ab93 25-Jan-2006 Colin Percival <cperciva@FreeBSD.org>

Make sure buffers in if_bridge are fully initialized before copying
them to userland.

Security: FreeBSD-SA-06:06.kmem


# 7c2fb83a 13-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Add code that clears certain capabilities from the member interface, these are
restored when its removed from the bridge.

At the moment we only clear IFCAP_TXCSUM. Since a locally generated packet on
the bridge may be sent out any one or more interfaces it cant be assumed that
every card does hardware csums. Most bridges don't generate a lot of traffic
themselves so turning off offloading won't hurt, bridged packets are
unaffected.

Tested by: Bruce Walker (bmw borderware.com)
MFC after: 5 days


# f0feaf4f 02-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Fix a brain-o in the last commit, the conditional was always false.


# 94e45ae5 02-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Reorganise bridge_rtupdate slightly to reduce duplication.


# ef9ac7c4 02-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Reset the route expiry time on each update rather than always letting them get
GC'd and recreated.


# bc9f74c7 02-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

It is better to use time_uptime here since it is monotonic.

Pointed out by: glebius


# ec311647 02-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Minor whitespace cleanup.


# f595d627 02-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Read time_second directly rather than calling getmicrotime().

Obtained from: DragonflyBSD


# a47f91cd 29-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

When pfil(9) is enabled the bridge only considers ETHERTYPE_ARP, ETHERTYPE_IP and
ETHERTYPE_IPV6 frames. Change this to be a sysctl knob so that is able to still
bridge non-IP packets if desired.

Also return early if all pfil_* sysctls are turned off, the user obviously does
not want to filter on the bridge.


# 73ff045c 21-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add RFC 3378 EtherIP support. This change makes it possible to add gif
interfaces to bridges, which will then send and receive IP protocol 97 packets.
Packets are Ethernet frames with an EtherIP header prepended.

Obtained from: NetBSD
MFC after: 2 weeks


# 1e420062 21-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

As of r1.21 all broadcast packets are reprocessed by ether_input as arriving on
the bridge, this caused these packets to show up twice via bpf. Do not process
them twice with BPF_TAP.

MFC after: 3 days


# 9d5e4aa8 17-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

Use M_ZERO for the bridge_iflist to ensure there are no unexpected suprises.


# 6b743820 17-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

Minor whitespace cleanup.


# e0a87e8a 16-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

Change from a callback in if_ethersubr to using EVENTHANDLER in order to detach
span ports when they disappear. The span port does not have a pointer to the
softc so revert r1.31 and bring back the softc linked-list.

MFC after: 2 weeks


# 7536320f 15-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

It is not safe to use m_copypacket() here as the returned mbuf is readonly,
change to m_dup and keep the alignment on the layer3 header.

MFC after: 1 week


# 91f6764e 13-Dec-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add support for creating span ports so that one can snoop bridged traffic
from another interface/machine/network.

Obtained from: OpenBSD
MFC after: 2 weeks


# 53b5c460 29-Nov-2005 Andrew Thompson <thompsa@FreeBSD.org>

The bridge is capable of sending broadcast packets so enable IFF_BROADCAST

Requested by: des


# 16e7e7d4 13-Nov-2005 Andrew Thompson <thompsa@FreeBSD.org>

Fix a second missed case where the refcount is not decremented.

MFC after: 3 days


# bb4b5f54 13-Nov-2005 Andrew Thompson <thompsa@FreeBSD.org>

Fix a mbuf and refcnt leak in the broadcast code.

If the packet is rejected from pfil(9) then continue the loop rather than
returning, this means that we can still try to send it out the remaining
interfaces but more importantly the mbuf is freed and refcount decremented on
exit.


# 4a0d6638 11-Nov-2005 Ruslan Ermilov <ru@FreeBSD.org>

- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.


# 4e7e0183 08-Nov-2005 Andrew Thompson <thompsa@FreeBSD.org>

Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by: brooks
Reviewed by: brooks


# 1a266137 23-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

If we have been called from ether_ifdetach() then do not try and clear the
promisc flag from the member interface, this is a no-op anyway since the
interface is disappearing. The driver may have already released
its resources such as miibus and this is likely to panic the kernel.

Submitted and tested by: Wojciech A. Koszek
MFC after: 2 weeks


# 4c843479 14-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

Make four more functions static that were missed in the last commit.


# 6b32f3d3 14-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

Change most of the bridge and stp funtions to static. This has highlighted
that the following funtions are not used, wrap in '#ifdef noused' for the
moment.

bstp_enable_change_detection
bstp_disable_change_detection
bstp_set_bridge_priority
bstp_set_port_priority
bstp_set_path_cost


# fd6238a6 13-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

Further clean up the bridge hooks in if_ethersubr.c and ng_ether.c

- move the function pointer definitions to if_bridgevar.h
- move most of the logic to the new BRIDGE_INPUT and BRIDGE_OUTPUT macros
- remove unneeded functions from if_bridgevar.h and sort a little.


# 20a65f37 13-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

From 101 ways to panic your kernel.

Use bridge_ifdetach() to notify the bridge that a member has been detached. The
bridge can then remove it from its interface list and not try to send out via a
dead pointer.


# 9cff52f7 13-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

Clean up the if_bridge hooks a bit in if_ethersubr.c and ng_ether.c, move
the broadcast/multicast test to bridge_input().

Requested by: glebius


# febd0759 12-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

Change the reference counting to count the number of cloned interfaces for each
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.

Assert that all cloners have been destroyed when freeing the memory.

Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.

Discussed with: brooks, pjd, -current
Reviewed by: brooks


# d5edd47e 02-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

Do not packet filter in the bridge_start() routine, locally generated packets
are already filtered by the higher layers.

Approved by: mlaier (mentor)
MFC after: 3 days


# ef64cd19 21-Sep-2005 Andrew Thompson <thompsa@FreeBSD.org>

Fix an alignment panic my preserving the 2byte padding (ETHER_ALIGN) on our
copied mbuf, which keeps the IP header 32-bit aligned. This copied mbuf is
reinjected back into ether_input and off to the IP routines.

Reported and tested by: Peter van Dijk
Approved by: mlaier (mentor)
MFC after: 3 days


# 59280079 06-Sep-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add support for multicast to the bridge and allow inet6 addresses to be
assigned to the interface.

IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.

An address can be assigned in the usual way;
ifconfig bridge0 inet6 xxxx:...

Tested by: bmah
Reviewed by: ume (netinet6)
Approved by: mlaier (mentor)
MFC after: 1 week


# 68e84b98 26-Aug-2005 Andrew Thompson <thompsa@FreeBSD.org>

Fix a panic in softclock() if the interface is destroyed with a bpf consumer
attached.

This is caused by bpf_detachd clearing IFF_PROMISC on the interface which does
a SIOCSIFFLAGS ioctl. The problem here is that while the interface has been
stopped, IFF_UP has not been cleared so IFF_UP != IFF_DRV_RUNNING, this causes
the ioctl function to init() the interface which resets the callouts.

The destroy then completes and frees the softc but softclock will panic on a
dead callout pointer.

Ensure ifp->if_flags matches reality by clearing IFF_UP when we destroy.

Silence from: rwatson
Approved by: mlaier (mentor)
MFC after: 3 days


# dba31bde 23-Aug-2005 Andrew Thompson <thompsa@FreeBSD.org>

The mtu check in bridge_enqueue is bogus as the maximum Ethernet frame is
actually 1514, so comparing the mbuf length which includes the Ethernet header
to the interface MTU is wrong.

The check was a little over the top so just remove it.

Approved by: mlaier (mentor)
MFC after: 3 days


# 23e76431 18-Aug-2005 Andrew Thompson <thompsa@FreeBSD.org>

Mark the callouts as MPSAFE as if_bridge has been giant-free since day 1.

Use the SMP friendly callout_init_mtx() while we are here.

Approved by: mlaier (mentor)
MFC after: 3 days


# a1c0fd4d 14-Aug-2005 Andrew Thompson <thompsa@FreeBSD.org>

Ensure that we are holding the lock when initialising the bridge interface. We
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.

Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().

Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days


# 13f4c340 09-Aug-2005 Robert Watson <rwatson@FreeBSD.org>

Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz
MFC after: 7 days


# 3155122e 08-Aug-2005 Andrew Thompson <thompsa@FreeBSD.org>

Use m_copypacket() which is an optimization of the common case
m_copym(m, 0, M_COPYALL, how).

This is required for strict alignment architectures where we align the IP
header in the input path but m_copym() will create an unaligned copy in
bridge_broadcast(). m_copypacket() preserves alignment of the first mbuf.

Noticed by: Petri Simolin
Approved by: mlaier (mentor)
MFC after: 3 days


# 39bb2fca 24-Jul-2005 Andrew Thompson <thompsa@FreeBSD.org>

We check that all the member interfaces have the same MTU on attach to the
bridge but the interface can still be changed afterwards.

This falls under the 'dont do that' category but log an warning when INVARIANTS
is defined.

Approved by: mlaier (mentor)
MFC after: 3 days


# 12b47243 20-Jul-2005 Andrew Thompson <thompsa@FreeBSD.org>

Clear the PROMISC flag from the vlan interface when we remove a member. We
checked for IFT_L2VLAN in bridge_ioctl_add() but not bridge_delete_member().

Approved by: mlaier (mentor)


# 489fc225 13-Jul-2005 Andrew Thompson <thompsa@FreeBSD.org>

Previously the bridge MTU was set to ETHERMTU and could not be changed. Since
we can only bridge interfaces with the same value it meant that all members had
to be set at ETHERMTU as well.

Allow the first member to be added to define the MTU for the bridge, the check
still applies to all additional members.

Print an informative message if the MTU is incorrect [1]

Requested by: Niki Denev [1]
Approved by: mlaier (mentor)
MFC after: 3 days


# ea32e732 05-Jul-2005 Andrew Thompson <thompsa@FreeBSD.org>

- Previously when broadcasting to N number of interfaces we would run pfil
hooks for each outgoing interface but also run pfil hooks _N times_ on the
bridge interface. This is changed so pfil hooks are run once for the bridge
interface (bridge0) and then only on the outgoing interfaces in the broadcast
loop.

- Simplify bridge_enqueue() by moving bridge_pfil() to the callers.

- Check (inet6_pfil_hook.ph_busy_count >= 0), it may be possible to have a
packet filter hooked for only ipv6 but we were only checking if ipv4 hooks
were busy.

- Minor optimisation for null mbuf check after bridge_pfil(), move it into the
if-block as it couldnt possibly be null outside.

Prodded by: mlaier
Approved by: re (scottl), mlaier (mentor)


# 2fcb030a 02-Jul-2005 Andrew Thompson <thompsa@FreeBSD.org>

Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)


# 49808fa4 29-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Sync if_bridge to NetBSD r1.31

Rename conflicting variables when handling SNAP Ethernet frames.

Obtained from: NetBSD
Approved by: mlaier (mentor)
Approved by: re (blanket)


# ca6c404c 27-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Fix a panic when bringing up the bridge interface. We were casting a ifnet
pointer to a softc which is no longer valid since the ifnet struct was split
out from the softc.

Approved by: mlaier (mentor)
Approved by: re (blanket)


# e7acea82 10-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Catch up with the struct ifnet changes and use if_alloc().

Reviewed by: brooks
Approved by: mlaier (mentor)


# fc74a9f9 10-Jun-2005 Brooks Davis <brooks@FreeBSD.org>

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


# 2c67c57c 10-Jun-2005 Max Laier <mlaier@FreeBSD.org>

Add missing {} in last commit.


# c8b01292 09-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add dummynet(4) support to if_bridge, this code is largely based on bridge.c.

This is the final piece to match bridge.c in functionality, we can now be a
drop-in replacement.

Approved by: mlaier (mentor)


# 82116c33 07-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Bring in IPFW layer2 filtering from bridge.c, this allows Ethernet filtering
using the layer2, mac and mac-type keywords.

This is one of the last features that bridge.c has over if_bridge and gets us
very close to a full functional replacement.

Approved by: mlaier (mentor)


# f2999b2f 05-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Change ipv6 packet filtering to match ipv4. It now checks pfil_member and
pfil_bridge to determine which interfaces to filter on.

Approved by: mlaier (mentor)


# 31997bf2 04-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add if_bridge, which provides more advanced Ethernet bridging and 802.1d
spanning tree support.

Based on Jason Wright's bridge driver from OpenBSD, and modified by Jason R.
Thorpe in NetBSD.

Reviewed by: mlaier, bms, green
Silence from: -net
Approved by: mlaier (mentor)
Obtained from: NetBSD