History log of /freebsd-current/sys/net/if_lagg.h
Revision Date Author Comments
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# dcd7f0bd 24-Mar-2023 Zhenlei Huang <zlei@FreeBSD.org>

lagg: Various style fixes

MFC after: 1 week


# 713ceb99 28-Jul-2022 Andrew Gallatin <gallatin@FreeBSD.org>

lagg: fix lagg ifioctl after SIOCSIFCAPNV

Lagg was broken by SIOCSIFCAPNV when all underlying devices
support SIOCSIFCAPNV. This change updates lagg to work with
SIOCSIFCAPNV and if_capabilities2.

Reviewed by: kib, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35865


# 19ecb5e8 29-Dec-2020 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix for IPoIB over lagg(4).

Need to update both link layer address and broadcast address when active link changes for IP over infiniband.
This is because the broadcast address contains the so-called P-key, which is interface dependent.

Reviewed by: kib @
Differential Revision: https://reviews.freebsd.org/D27658
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking


# a92c4bb6 22-Oct-2020 Hans Petter Selasky <hselasky@FreeBSD.org>

Add support for IP over infiniband, IPoIB, to lagg(4). Currently only
the failover protocol is supported due to limitations in the IPoIB
architecture. Refer to the lagg(4) manual page for how to configure
and use this new feature. A new network interface type,
IFT_INFINIBANDLAG, has been added, similar to the existing
IFT_IEEE8023ADLAG .

ifconfig(8) has been updated to accept a new laggtype argument when
creating lagg(4) network interfaces. This new argument is used to
distinguish between ethernet and infiniband type of lagg(4) network
interface. The laggtype argument is optional and defaults to
ethernet. The lagg(4) command line syntax is backwards compatible.

Differential Revision: https://reviews.freebsd.org/D26254
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 2a73c8f5 11-Jun-2020 Ravi Pokala <rpokala@FreeBSD.org>

Decode the "LACP Fast Timeout" LAGG option flag

r286700 added the "lacp_fast_timeout" option to `ifconfig', but we forgot to
include the new option in the string used to decode the option bits. Add
"LACP_FAST_TIMO" to LAGG_OPT_BITS.

Also, s/LAGG_OPT_LACP_TIMEOUT/LAGG_OPT_LACP_FAST_TIMO/g , to be clearer that
the flag indicates "Fast Timeout" mode.

Reported by: Greg Foster <gfoster at panasas dot com>
Reviewed by: jpaetzel
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D25239


# c23df8ea 09-Jan-2020 Mark Johnston <markj@FreeBSD.org>

lagg: Further cleanup of the rr_limit option.

Add an option flag so that arbitrary updates to a lagg's configuration
do not clear sc_stride. Preseve compatibility for old ifconfig
binaries. Update ifconfig to use the new flag and improve the casting
used when parsing the option parameter.

Modify the RR transmit function to avoid locklessly reading sc_stride
twice. Ensure that sc_stride is always 1 or greater.

Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23092


# c104c299 22-Dec-2019 Mark Johnston <markj@FreeBSD.org>

lagg: Clean up handling of the rr_limit option.

- Don't allow an unprivileged user to set the stride. [1]
- Only set the stride under the softc lock.
- Rename the internal fields to accurately reflect their use. Keep
ro_bkt to avoid changing the user API.
- Simplify the implementation. The port index is just sc_seq / stride.
- Document rr_limit in ifconfig.8.

Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> [1]
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22857


# 35961dce 03-May-2019 Andrew Gallatin <gallatin@FreeBSD.org>

Select lacp egress ports based on NUMA domain

This change creates an array of port maps indexed by numa domain
for lacp port selection. If we have lacp interfaces in more than
one domain, then we select the egress port by indexing into the
numa port maps and picking a port on the appropriate numa domain.

This is behavior is controlled by the new ifconfig use_numa flag
and net.link.lagg.use_numa sysctl/tunable (both modeled after the
existing use_flowid), which default to enabled.

Reviewed by: bz, hselasky, markj (and scottl, earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20060


# 0f8d79d9 24-May-2018 Matt Macy <mmacy@FreeBSD.org>

CK: update consumers to use CK macros across the board

r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors


# db5a36bd 22-May-2018 Mark Johnston <markj@FreeBSD.org>

Simplify lagg_input().

No functional change intended.

MFC after: 2 weeks


# 99031b8f 14-May-2018 Stephen Hurd <shurd@FreeBSD.org>

Replace rmlock with epoch in lagg

Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data. This fixes LORs
reported where the rwlock was obtained when the sxlock was held.

Submitted by: mmacy
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15355


# e3d90506 25-May-2017 Alexander Motin <mav@FreeBSD.org>

Remove some code, dead from the day one.


# 2f86d4b0 02-May-2017 Alexander Motin <mav@FreeBSD.org>

Introduce sleepable locks into if_lagg.

Before this change if_lagg was using nonsleepable rmlocks to protect its
internal state. This patch introduces another sx lock to protect code
paths that require sleeping, while still uses old rmlock to protect hot
nonsleepable data paths.

This change allows to remove taskqueue decoupling used before to change
interface addresses without holding the lock. Instead it uses sx lock to
protect direct if_ioctl() calls.

As another bonus, the new code synchronizes enabled capabilities of member
interfaces, and allows to control them with ifconfig laggX, that was
impossible before. This part should fix interoperation with if_bridge,
that may need to disable some capabilities, such as TXCSUM or LRO, to allow
bridging with noncapable interfaces.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10514


# 13157b2b 29-Jan-2017 Luiz Otavio O Souza <loos@FreeBSD.org>

Do not update the lagg link layer address when destroying a lagg clone.

This would enqueue an event to send the gratuitous arp on a dying lagg
interface without any physical ports attached to it.

Apart from that, the taskqueue_drain() on lagg_clone_destroy() runs too
late, when the ifp data structure is already freed. Fix that too.

Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)


# 729a4cff 05-Apr-2016 Ravi Pokala <rpokala@FreeBSD.org>

Revert accidental submit of WIP as part of r297609

Pointyhat to: rpokala


# 06152bf0 05-Apr-2016 Ravi Pokala <rpokala@FreeBSD.org>

Storage Controller Interface driver - typo in unimplemented macro in
scic_sds_controller_registers.h

s/contoller/controller/

PR: 207336
Submitted by: Tony Narlock <tony @ git-pull.com>


# d62edc5e 22-Jan-2016 Marcelo Araujo <araujo@FreeBSD.org>

Add an IOCTL rr_limit to let users fine tuning the number of packets to be
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.

Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500

Reviewed by: thompsa, glebius, adrian (old patch)
Approved by: bapt (mentor)
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D540


# d6e82913 17-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay


# 52e53e2d 15-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111


# bb3d23fd 01-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after: 4 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D4004


# 973532fc 04-Oct-2015 Marcelo Araujo <araujo@FreeBSD.org>

Remove per complete the fec aggregation protocol.
The remove began with revision r271733.

NOTE: This patch must never be merge to 10-Stable

Reviewed by: glebius
Approved by: bapt (mentor)
Relnotes: Yes
Sponsored by: EuroBSDCon Sweden.
Differential Revision: D3786


# 0e02b43a 12-Aug-2015 Hiren Panchasara <hiren@FreeBSD.org>

Make LAG LACP fast timeout tunable through IOCTL.

Differential Revision: D3300
Submitted by: LN Sundararajan <lakshmi.n at msystechnologies>
Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas
MFC after: 2 weeks
Sponsored by: Panasas


# b7ba031f 11-Mar-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision: https://reviews.freebsd.org/D1987
Sponsored by: Mellanox Technologies
MFC after: 1 month


# c2529042 01-Dec-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after: 1 month
Sponsored by: Mellanox Technologies


# 033074c4 09-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Replace 'struct route *' if_output() argument with 'struct nhop_info *'.
Leave 'struct route' as is for legacy routing api users.
Remove most of rtalloc_ign*-derived functions.


# 6d478167 04-Oct-2014 Hiroki Sato <hrs@FreeBSD.org>

- Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.


# 9732189c 02-Oct-2014 Hiroki Sato <hrs@FreeBSD.org>

Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
backward compatibility with old ifconfig(8).


# 939a050a 01-Oct-2014 Hiroki Sato <hrs@FreeBSD.org>

Virtualize lagg(4) cloner. This change fixes a panic when tearing down
if_lagg(4) interfaces which were cloned in a vnet jail.

Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead. Flags and per-interface
statistics counters are displayed in "ifconfig -v".

CR: D842


# 112f50ff 28-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Finally, convert counters in struct ifnet to counter(9).

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 7d6cc45c 27-Sep-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by: asomers (prev.version), glebius
CR: D781
MFC after: 2 weeks
Sponsored by: Yandex LLC


# eade13f9 26-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove macros that hide access to struct ifnet fields.


# 38738d73 25-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Make all lagg protocol methods live in lagg_protos, not in softc. All
interfaces of a same protocol, use the same methods.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 6900d0d3 25-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Whitespace.
- Remove caddr_t.


# 16ca790e 26-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Provide lagg_proto_attach(), lagg_proto_detach().
- Make detach a protocol method in lagg_protos.
- Simplify code to lookup protocols.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# b5e094cf 26-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Make lagg protos a enum.


# b1bbc5b3 26-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Make lagg protocols detach methods returning void.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 99cdd961 17-Sep-2014 Marcelo Araujo <araujo@FreeBSD.org>

Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group
and receives frames on any port of the lagg(4).

Phabric: D549
Reviewed by: glebius, thompsa
Approved by: glebius
Obtained from: OpenBSD
Sponsored by: QNAP Systems Inc.


# 2d222cb7 03-Aug-2014 Alexander Motin <mav@FreeBSD.org>

Improve locking of multicast addresses in VLAN and LAGG interfaces.

This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.

MFC after: 2 weeks


# 1a8959da 29-Dec-2013 Scott Long <scottl@FreeBSD.org>

Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.

Reviewed by: max
Obtained from: Netflix
MFC after: 3 days


# 310915a4 29-Aug-2013 Adrian Chadd <adrian@FreeBSD.org>

Convert the if_lagg rwlock to an rmlock.

We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by: Netflix, Inc.


# 49de4f22 26-Jul-2013 Adrian Chadd <adrian@FreeBSD.org>

Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.

* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1

tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.

Sponsored by: Netflix


# 31402c27 12-Jul-2013 Adrian Chadd <adrian@FreeBSD.org>

Bring over some link aggregation / LACP protocol improvements and debugging
additions.

* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
to test whether seeing or hearing heartbeats correctly cause LACP to
drop the port.
* Add in (and make default) a strict LACP mode, which requires the
heartbeat on a port to be heard before it's used. Sometimes vendor ports
will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
configured ports, rather than the largest.

Obtained from: Netflix
MFC after: 2 weeks


# 47e8d432 25-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Add const qualifier to the dst parameter of the ifnet if_output method.


# b64478a1 15-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Switch lagg(4) statistics to counter(9).

The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by: Nginx, Inc.


# 209dddb9 22-Mar-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove __FreeBSD_version ifdefs.


# 86f67641 06-Mar-2012 Andrew Thompson <thompsa@FreeBSD.org>

Add the ability to set which packet layers are used for the load balance hash
calculation.


# 0bf97ae2 22-Feb-2012 Andrew Thompson <thompsa@FreeBSD.org>

Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.

PR: kern/164901
Submitted by: Eugene Grosbein
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 644da90d 06-Feb-2010 Ermal Luçi <eri@FreeBSD.org>

Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features.

PR: kern/141646
Reviewed by: thompsa
Approved by: thompsa(co-mentor)
MFC after: 2 weeks


# 279aa3d4 16-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 960dab09 11-Oct-2007 Andrew Thompson <thompsa@FreeBSD.org>

Fix two panics in lagg.

1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.

2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.

MFC after: 3 days


# de75afe6 30-Jul-2007 Andrew Thompson <thompsa@FreeBSD.org>

- Propagate the largest set of interface capabilities supported by all lagg
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.

This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.

Approved by: re (kensmith)
MFC After: 3 days


# b3d37ca5 05-Jul-2007 Andrew Thompson <thompsa@FreeBSD.org>

Allow the LACP state to be queried from userland which at the moment is the
actor and partner peer info. Print out the active aggregator and per port data
in verbose mode from ifconfig.

Approved by: re (mux)


# ec32b37e 12-Jun-2007 Andrew Thompson <thompsa@FreeBSD.org>

non-functional cleanup
- remove dead code
- use consistent variable names
- gc unused defines
- whitespace cleanup


# 3bf517e3 15-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

Change from a mutex to a read/write lock. This allows the tx port to be
selected simultaneously by multiple senders and transmit/receive is not
serialised between aggregated interfaces.


# cdc6f95f 06-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

Call if_setlladdr() on the aggregation port from a taskqueue so the softc lock
is not held. The short delay between aggregating the port and setting the MAC
address is fine.


# 108fe96a 06-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

Avoid touching various unsafe parts if the interface is disappearing.


# d74fd345 06-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

Change from using if_delmulti() to if_delmulti_ifma() as it simplifies the code
and is safe to use if the ifp has disappeared.

Suggested by: bms


# ff6c5cf6 03-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

Fix flag descriptions.


# e3163ef6 03-May-2007 Andrew Thompson <thompsa@FreeBSD.org>

- Add a disabled state for ports that can not be aggregated
- Refine check for lacp links, set to disabled if not suitable


# 18242d3b 16-Apr-2007 Andrew Thompson <thompsa@FreeBSD.org>

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@