#
fadbb6f8 |
|
06-May-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
lagg: remove use of net epoch in the ioctl paths Rely on LAGG_SLOCK() instead. The use of network epoch(9) here was added in 6573d7580b851 (later tidied by 87bf9b9cbeebc) as a large sweep that blindly substituted blocking kernel primitives with epoch(9). In these particular code paths use of epoch(9) is incorrect and doesn't provide any protection against a stale pointer. Recent fix 48698ead6ff0, which should actually have removed the epoch use, created a potential sleeping in epoch problem.
|
#
57068597 |
|
06-May-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
lagg: propagate up/down to the children Based on the old submission from asomers@. With modern state of locking in lagg(4), the patch got much simplier. Enable the test that was waiting for this change. PR: 226144 Reviewed by: asomers Differential Revision: https://reviews.freebsd.org/D44605
|
#
48698ead |
|
23-Feb-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
lagg: wrap lagg_port2req() into LAGG_SLOCK() Although a port addition is coded in a sequence where first all softc information is fulfilled and only then it is attached to the lagg, we still need a locking primitive to guarantee cache invalidation. Panic observed in the wild shows that lacp_portreq() called via lagg_port_ioctl(SIOCGLAGGPORT) immediately after port creation may see lp->lp_psc as NULL and panic. In the core file we will see valid data all around. A race via lagg_ioctl() wasn't observed but potentially is possible. Differential Revision: https://reviews.freebsd.org/D43501
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
401f0344 |
|
17-Apr-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lagg(4): Correctly define some sysctl variables 939a050ad96c virtualized lagg(4), but the corresponding sysctl of some virtualized global variables are not marked with CTLFLAG_VNET. A try to operate on those variables via sysctl will effectively go to the 'master' copies and the virtualized ones are not read or set accordingly. As a side effect, on updating the 'master' copy, the virtualized global variables of newly created vnets will have correct values. PR: 270705 Reviewed by: kp Fixes: 939a050ad96c Virtualize lagg(4) cloner MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D39467
|
#
5f3d0399 |
|
02-Apr-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lagg(4): Tap traffic after protocol processing Different lagg protocols have different means and policies to process incoming traffic. For example, for failover protocol, by default received traffic is only accepted when they are received through the active port. For lacp protocol, LACP control messages are tapped off, also traffic will be dropped if they are received through the port which is not in collecting state or is not joined to the active aggregator. It confuses if user dump and see inbound traffic on lagg(4) interfaces but they are actually silently dropped and not passed into the net stack. Tap traffic after protocol processing so that user will have consistent view of the inbound traffic, meanwhile mbuf is set with correct receiving interface and bpf(4) will diagnose the right direction of inbound packets. PR: 270417 Reviewed by: melifaro (previous version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39225
|
#
90820ef1 |
|
02-Apr-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
infiniband: Widen NET_EPOCH coverage From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx) do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a member of lagg(4) interface, and after returning from lagg_input_infiniband() the receiving interface of mbuf is set to lagg(4) interface, then when concurrently destroying the lagg(4) interface, there is a small window that the interface gets destroyed and becomes invalid before infiniband_input() re-enter net epoch, thus leading use-after-free. Widen NET_EPOCH coverage to prevent use-after-free. Thanks hselasky@ for testing with mlx5 devices. Reviewed by: hselasky Tested by: hselasky MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39275
|
#
5a8abd0a |
|
31-Mar-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lacp: Use C99 bool for boolean return value This improves readability. No functional change intended. MFC after: 1 week
|
#
d4a80d21 |
|
29-Mar-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lagg(4): Do not enter net epoch recursively This saves a little resources. No functional change intended. Reviewed by: kp Fixes: b8a6e03fac92 Widen NET_EPOCH coverage MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39267
|
#
dbe86dd5 |
|
29-Mar-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lagg(4): Refactor out some lagg protocol input routines into a default one Those input routines are identical. Also inline two fast paths. No functional change intended. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39251
|
#
fcac5719 |
|
29-Mar-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lagg(4): Make lagg_list and lagg_detach_cookie static They are used internally only. No functional change intended. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39250
|
#
dcd7f0bd |
|
24-Mar-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
lagg: Various style fixes MFC after: 1 week
|
#
adf62e83 |
|
08-Feb-2023 |
Justin Hibbits <jhibbits@FreeBSD.org> |
infiniband: Convert BPF handling for IfAPI Summary: All callers of infiniband_bpf_mtap() call it through the wrapper macro, which checks the if_bpf member explicitly. Since this is getting hidden, move this check into the internal function and remove the wrapper macro. Reviewed by: hselasky Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D39024
|
#
66bdbcd5 |
|
03-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
net: unify mtu update code Subscribers: imp, ae, glebius Differential Revision: https://reviews.freebsd.org/D38893
|
#
2c2b37ad |
|
13-Jan-2023 |
Justin Hibbits <jhibbits@FreeBSD.org> |
ifnet/API: Move struct ifnet definition to a <net/if_private.h> Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
|
#
110ce09c |
|
13-Jan-2023 |
Tom Jones <thj@FreeBSD.org> |
if_lagg: Allow lagg interfaces to be used with netmap Reviewed by: zlei Sponsored by: Zenarmor Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D37436
|
#
91ebcbe0 |
|
21-Sep-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
if_clone: migrate some consumers to the new KPI. Convert most of the cloner customers who require custom params to the new if_clone KPI. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D36636 MFC after: 2 weeks
|
#
713ceb99 |
|
28-Jul-2022 |
Andrew Gallatin <gallatin@FreeBSD.org> |
lagg: fix lagg ifioctl after SIOCSIFCAPNV Lagg was broken by SIOCSIFCAPNV when all underlying devices support SIOCSIFCAPNV. This change updates lagg to work with SIOCSIFCAPNV and if_capabilities2. Reviewed by: kib, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35865
|
#
fa267a32 |
|
21-Jul-2022 |
Dimitry Andric <dim@FreeBSD.org> |
Fix unused variable warning in if_lagg.c With clang 15, the following -Werror warning is produced: sys/net/if_lagg.c:2413:6: error: variable 'active_ports' set but not used [-Werror,-Wunused-but-set-variable] int active_ports = 0; ^ The 'active_ports' variable appears to have been a debugging aid that has never been used for anything (ref https://reviews.freebsd.org/D549), so remove it. MFC after: 3 days
|
#
1967e313 |
|
24-May-2022 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
lagg(4): Add support for allocating TLS receive tags. The TLS receive tags are allocated directly from the receiving interface, because mbufs are flowing in the opposite direction and then route change checks are not useful, because they only work for outgoing traffic. Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking
|
#
3142d4f6 |
|
19-Nov-2021 |
Kristof Provost <kp@FreeBSD.org> |
lagg: fix unused-but-set-variable MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
acdfc096 |
|
06-Nov-2021 |
Wojciech Macek <wma@FreeBSD.org> |
lagg: update capabilites on SIOCSIFMTU Some NICs might have limited capabilities when Jumbo frames are used. For exampe some neta interfaces only support TX csum offload when the packet size is lower than a value specified in DT. Fix it by re-reading capabilities of children interfaces after MTU has been successfully changed. Found by: Jerome Tomczyk <jerome.tomczyk@stormshield.eu> Reviewed by: jhb Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D32724
|
#
c782ea8b |
|
14-Sep-2021 |
John Baldwin <jhb@FreeBSD.org> |
Add a switch structure for send tags. Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'. A pointer to this structure is added to the generic part of send tags and is initialized by m_snd_tag_init() (which now accepts a switch structure as a new argument in place of the type). Previously, device driver ifnet methods switched on the type to call type-specific functions. Now, those type-specific functions are saved in the switch structure and invoked directly. In addition, this more gracefully permits multiple implementations of the same tag within a driver. In particular, NIC TLS for future Chelsio adapters will use a different implementation than the existing NIC TLS support for T6 adapters. Reviewed by: gallatin, hselasky, kib (older version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31572
|
#
c1384241 |
|
17-Aug-2021 |
Luiz Otavio O Souza <loos@FreeBSD.org> |
lagg: don't update link layer addresses on destroy When the lagg is being destroyed it is not necessary update the lladdr of all the lagg members every time we update the primary interface. Reviewed by: scottl Obtained from: pfSense MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31586
|
#
1a714ff2 |
|
26-Jan-2021 |
Randall Stewart <rrs@FreeBSD.org> |
This pulls over all the changes that are in the netflix tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357
|
#
19ecb5e8 |
|
29-Dec-2020 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Fix for IPoIB over lagg(4). Need to update both link layer address and broadcast address when active link changes for IP over infiniband. This is because the broadcast address contains the so-called P-key, which is interface dependent. Reviewed by: kib @ Differential Revision: https://reviews.freebsd.org/D27658 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking
|
#
5ee33a90 |
|
08-Dec-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Fixup r368446 with KERN_TLS.
|
#
e1074ed6 |
|
08-Dec-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
The list of ports in configuration path shall be protected by locks, epoch shall be used only for fast path. Thus use LAGG_XLOCK() in lagg_[un]register_vlan. This fixes sleeping in epoch panic. PR: 240609
|
#
87bf9b9c |
|
08-Dec-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Convert LAGG_RLOCK() to NET_EPOCH_ENTER(). No functional changes.
|
#
8732245d |
|
18-Nov-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
LACP: When suppressing distributing, return ENOBUFS When links come and go, lacp goes into a "suppress distributing" mode where it drops traffic for 3 seconds. When in this mode, lagg/lacp historiclally drops traffic with ENETDOWN. That return value causes TCP to close any connection where it gets that value back from the lower parts of the stack. This means that any TCP connection with active traffic during a 3-second windown when an LACP link comes or goes would get closed. TCP treats return values of ENOBUFS as transient errors, and re-schedules transmission later. So rather than returning ENETDOWN, lets return ENOBUFS instead. This allows TCP connections to be preserved. I've tested this by repeatedly bouncing links on a Netlfix CDN server under a moderate (20Gb/s) load and overved ENOBUFS reported back to the TCP stack (as reported by a RACK TCP sysctl). Reviewed by: jhb, jtl, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27188
|
#
2f4ffa9f |
|
11-Nov-2020 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Fix possible NULL pointer dereference. lagg(4) replaces if_output method of its child interfaces and expects that this method can be called only by child interfaces. But it is possible that lagg_port_output() could be called by children of child interfaces. In this case ifnet's if_lagg field is NULL. Add check that lp is not NULL. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC
|
#
36e0a362 |
|
29-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc(). This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000
|
#
a92c4bb6 |
|
22-Oct-2020 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Add support for IP over infiniband, IPoIB, to lagg(4). Currently only the failover protocol is supported due to limitations in the IPoIB architecture. Refer to the lagg(4) manual page for how to configure and use this new feature. A new network interface type, IFT_INFINIBANDLAG, has been added, similar to the existing IFT_IEEE8023ADLAG . ifconfig(8) has been updated to accept a new laggtype argument when creating lagg(4) network interfaces. This new argument is used to distinguish between ethernet and infiniband type of lagg(4) network interface. The laggtype argument is optional and defaults to ethernet. The lagg(4) command line syntax is backwards compatible. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking
|
#
56fb710f |
|
06-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Store the send tag type in the common send tag header. Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689
|
#
662c1305 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
#
3869d414 |
|
13-Aug-2020 |
Bryan Drewery <bdrewery@FreeBSD.org> |
lagg: Avoid adding a port to a lagg device being destroyed. The lagg_clone_destroy() handles detach and waiting for ifconfig callers to drain already. This narrows the race for 2 panics that the tests triggered. Both were a consequence of adding a port to the lagg device after it had already detached from all of its ports. The link state task would run after lagg_clone_destroy() free'd the lagg softc. kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:if_down+0x144 kernel:if_detach+0x659 if_tap.ko:tap_destroy+0x46 kernel:if_clone_destroyif+0x1b7 kernel:if_clone_destroy+0x8d kernel:ifioctl+0x29c kernel:kern_ioctl+0x2bd kernel:sys_ioctl+0x16d kernel:amd64_syscall+0x337 kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:do_link_state_change+0x9b kernel:taskqueue_run_locked+0x10b kernel:taskqueue_run+0x49 kernel:ithread_loop+0x19c kernel:fork_exit+0x83 PR: 244168 Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D25284
|
#
2a73c8f5 |
|
11-Jun-2020 |
Ravi Pokala <rpokala@FreeBSD.org> |
Decode the "LACP Fast Timeout" LAGG option flag r286700 added the "lacp_fast_timeout" option to `ifconfig', but we forgot to include the new option in the string used to decode the option bits. Add "LACP_FAST_TIMO" to LAGG_OPT_BITS. Also, s/LAGG_OPT_LACP_TIMEOUT/LAGG_OPT_LACP_FAST_TIMO/g , to be clearer that the flag indicates "Fast Timeout" mode. Reported by: Greg Foster <gfoster at panasas dot com> Reviewed by: jpaetzel MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D25239
|
#
bd673b99 |
|
13-Apr-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
lagg: stop double-counting output errors and counting drops as errors Before this change, lagg double-counted errors from lagg members, and counted every drop by a lagg member as an error. Eg, if lagg sent a packet, and the underlying hardware driver dropped it, a counter would be incremented by both lagg and the underlying driver. This change attempts to fix that by incrementing lagg's counters only for errors that do not come from underlying drivers. Reviewed by: hselasky, jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24331
|
#
98085bae |
|
09-Mar-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
make lacp's use_numa hashing aware of send tags When I did the use_numa support, I missed the fact that there is a separate hash function for send tag nic selection. So when use_numa is enabled, ktls offload does not work properly, as it does not reliably allocate a send tag on the proper egress nic since different egress nics are selected for send-tag allocation and packet transmit. To fix this, this change: - refectors lacp_select_tx_port_by_hash() and lacp_select_tx_port() to make lacp_select_tx_port_by_hash() always called by lacp_select_tx_port() - pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash() - adds a numa_domain field to if_snd_tag_alloc_params - plumbs the numa domain into places where we allocate send tags In testing with NIC TLS setup on a NUMA machine, I see thousands of output errors before the change when enabling kern.ipc.tls.ifnet.permitted=1. After the change, I see no errors, and I see the NIC sysctl counters showing active TLS offload sessions. Reviewed by: rrs, hselasky, jhb Sponsored by: Netflix
|
#
7029da5c |
|
26-Feb-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
|
#
84becee1 |
|
22-Jan-2020 |
Alexander Motin <mav@FreeBSD.org> |
Update route MTUs for bridge, lagg and vlan interfaces. Those interfaces may implicitly change their MTU on addition of parent interface in addition to normal SIOCSIFMTU ioctl path, where the route MTUs are updated normally. MFC after: 2 weeks Sponsored by: iXsystems, Inc.
|
#
2a4bd982 |
|
14-Jan-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Introduce NET_EPOCH_CALL() macro and use it everywhere where we free data based on the network epoch. The macro reverses the argument order of epoch_call(9) - first function, then its argument. NFC
|
#
97168be8 |
|
14-Jan-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanically substitute assertion of in_epoch(net_epoch_preempt) to NET_EPOCH_ASSERT(). NFC
|
#
c23df8ea |
|
09-Jan-2020 |
Mark Johnston <markj@FreeBSD.org> |
lagg: Further cleanup of the rr_limit option. Add an option flag so that arbitrary updates to a lagg's configuration do not clear sc_stride. Preseve compatibility for old ifconfig binaries. Update ifconfig to use the new flag and improve the casting used when parsing the option parameter. Modify the RR transmit function to avoid locklessly reading sc_stride twice. Ensure that sc_stride is always 1 or greater. Reviewed by: hselasky MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23092
|
#
c104c299 |
|
22-Dec-2019 |
Mark Johnston <markj@FreeBSD.org> |
lagg: Clean up handling of the rr_limit option. - Don't allow an unprivileged user to set the stride. [1] - Only set the stride under the softc lock. - Rename the internal fields to accurately reflect their use. Keep ro_bkt to avoid changing the user API. - Simplify the implementation. The port index is just sc_seq / stride. - Document rr_limit in ifconfig.8. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> [1] MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22857
|
#
b2e60773 |
|
26-Aug-2019 |
John Baldwin <jhb@FreeBSD.org> |
Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277
|
#
20abea66 |
|
01-Aug-2019 |
Randall Stewart <rrs@FreeBSD.org> |
This adds the third step in getting BBR into the tree. BBR and an updated rack depend on having access to the new ratelimit api in this commit. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20953
|
#
ce7fb386 |
|
06-Jun-2019 |
Mark Johnston <markj@FreeBSD.org> |
Restore the comment removed in r348745. LAGG_RLOCK() enters an epoch section, so the comment wasn't stale. Reported by: jhb MFC with: r348745
|
#
9995dfd3 |
|
06-Jun-2019 |
Mark Johnston <markj@FreeBSD.org> |
Conditionalize an in_epoch() call on INVARIANTS. Its result is only used to determine whether to perform further INVARIANTS-only checks. Remove a stale comment while here. Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week
|
#
fb3bc596 |
|
24-May-2019 |
John Baldwin <jhb@FreeBSD.org> |
Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117
|
#
35961dce |
|
03-May-2019 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Select lacp egress ports based on NUMA domain This change creates an array of port maps indexed by numa domain for lacp port selection. If we have lacp interfaces in more than one domain, then we select the egress port by indexing into the numa port maps and picking a port on the appropriate numa domain. This is behavior is controlled by the new ifconfig use_numa flag and net.link.lagg.use_numa sysctl/tunable (both modeled after the existing use_flowid), which default to enabled. Reviewed by: bz, hselasky, markj (and scottl, earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20060
|
#
841613dc |
|
28-Mar-2019 |
John Baldwin <jhb@FreeBSD.org> |
Use a dedicated malloc type for lagg(4)'s structures. Reviewed by: gallatin MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19719
|
#
2f59b04a |
|
28-Mar-2019 |
John Baldwin <jhb@FreeBSD.org> |
Remove nested epochs from lagg(4). lagg_bcast_start appeared to have a bug in that was using the last lagg port structure after exiting the epoch that was keeping that structure alive. However, upon further inspection, the epoch was already entered by the caller (lagg_transmit), so the epoch enter/exit in lagg_bcast_start was actually unnecessary. This commit generally removes uses of the net epoch via LAGG_RLOCK to protect the list of ports when the list of ports was already protected by an existing LAGG_RLOCK in a caller, or the LAGG_XLOCK. It also adds a missing epoch enter/exit in lagg_snd_tag_alloc while accessing the lagg port structures. An ifp is still accessed via an unsafe reference after the epoch is exited, but that is true in the current code and will be fixed in a future change. Reviewed by: gallatin MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19718
|
#
fa91f845 |
|
13-Feb-2019 |
Randall Stewart <rrs@FreeBSD.org> |
This commit adds the missing release mechanism for the ratelimiting code. The two modules (lagg and vlan) did have allocation routines, and even though they are indirect (and vector down to the underlying interfaces) they both need to have a free routine (that also vectors down to the actual interface). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D19032
|
#
829c56fc |
|
31-Jan-2019 |
John Baldwin <jhb@FreeBSD.org> |
Don't set IFCAP_TXRTLMT during lagg_clone_create(). lagg_capabilities() will set the capability once interfaces supporting the feature are added to the lagg. Setting it on a lagg without any interfaces is pointless as the if_snd_tag_alloc call will always fail in that case. Reviewed by: hselasky, gallatin MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19040
|
#
fbd8c330 |
|
30-Oct-2018 |
Marcelo Araujo <araujo@FreeBSD.org> |
Allow changing lagg(4) MTU. Previously, changing the MTU would require destroying the lagg and creating a new one. Now it is allowed to change the MTU of the lagg interface and the MTU of the ports will be set to match. If any port cannot set the new MTU, all ports are reverted to the original MTU of the lagg. Additionally, when adding ports, the MTU of a port will be automatically set to the MTU of the lagg. As always, the MTU of the lagg is initially determined by the MTU of the first port added. If adding an interface as a port for some reason fails, that interface is reverted to its original MTU. Submitted by: Ryan Moeller <ryan@freqlabs.com> Reviewed by: mav Relnotes: Yes Sponsored by: iXsystems Inc. Differential Revision: https://reviews.freebsd.org/D17576
|
#
13c6ba6d |
|
09-Oct-2018 |
Jonathan T. Looney <jtl@FreeBSD.org> |
There are three places where we return from a function which entered an epoch section without exiting that epoch section. This is bad for two reasons: the epoch section won't exit, and we will leave the epoch tracker from the stack on the epoch list. Fix the epoch leak by making sure we exit epoch sections before returning. Reviewed by: ae, gallatin, mmacy Approved by: re (gjb, kib) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17450
|
#
5ccac9f9 |
|
13-Aug-2018 |
Andrew Gallatin <gallatin@FreeBSD.org> |
lagg: allow lacp to manage the link state Lacp needs to manage the link state itself. Unlike other lagg protocols, the ability of lacp to pass traffic depends not only on the lagg members having link, but also on the lacp protocol converging to a distributing state with the link partner. If we prematurely mark the link as up, then we will send a gratuitous arp (via arp_handle_ifllchange()) before the lacp interface is capable of passing traffic. When this happens, the gratuitous arp is lost, and our link partner may cache a stale mac address (eg, when the base mac address for the lagg bundle changes, due to a BIOS change re-ordering NIC unit numbers) Reviewed by: jtl, hselasky Sponsored by: Netflix
|
#
5f901c92 |
|
24-Jul-2018 |
Andrew Turner <andrew@FreeBSD.org> |
Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147
|
#
6573d758 |
|
03-Jul-2018 |
Matt Macy <mmacy@FreeBSD.org> |
epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066
|
#
0f8d79d9 |
|
24-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors
|
#
db5a36bd |
|
22-May-2018 |
Mark Johnston <markj@FreeBSD.org> |
Simplify lagg_input(). No functional change intended. MFC after: 2 weeks
|
#
46d0f824 |
|
18-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
net: fix set but not used
|
#
d7c5a620 |
|
18-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366
|
#
70398c2f |
|
18-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj
|
#
99031b8f |
|
14-May-2018 |
Stephen Hurd <shurd@FreeBSD.org> |
Replace rmlock with epoch in lagg Use the new epoch based reclamation API. Now the hot paths will not block at all, and the sx lock is used for the softc data. This fixes LORs reported where the rwlock was obtained when the sxlock was held. Submitted by: mmacy Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15355
|
#
fd3bb7aa |
|
04-Jan-2018 |
Steven Hartland <smh@FreeBSD.org> |
Disabled the use of flowid for lagg by default Disabled the use of RSS hash from the network card aka flowid for lagg(4) interfaces by default as it's currently incompatible with the lacp and loadbalance protocols. The incompatibility is due to the fact that the flowid isn't know for the first packet of a new outbound stream which can result in the hash calculation method changing and hence a stream being incorrectly split across multiple interfaces during normal operation. This can be re-enabled by setting the following in loader.conf: net.link.lagg.default_use_flowid="1" Discussed with: kmacy Sponsored by: Multiplay
|
#
51352d9d |
|
25-Jul-2017 |
Sean Bruno <sbruno@FreeBSD.org> |
Don't hold the RM lock during lagg_proto_addport() to avoid an LOR. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reviewed by: mav MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11711
|
#
41cf0d54 |
|
26-May-2017 |
Alexander Motin <mav@FreeBSD.org> |
Call VLAN_CAPABILITIES() when LAGG capabilities change. This makes VLAN on top of LAGG to expose proper capabilities if they are changed after creation. MFC after: 1 week
|
#
8403ab79 |
|
26-May-2017 |
Alexander Motin <mav@FreeBSD.org> |
Improve applying unified capabilities to the lagg ports. Some NICs have some capabilities dependent, so that disabling one require disabling some other (TXCSUM/RXCSUM on em). This code tries to reach the consensus more insistently. PR: 219453 MFC after: 1 week
|
#
e3d90506 |
|
25-May-2017 |
Alexander Motin <mav@FreeBSD.org> |
Remove some code, dead from the day one.
|
#
bbfc32a6 |
|
05-May-2017 |
Alexander Motin <mav@FreeBSD.org> |
Relax r317696 locking to not drain taskqueue under the lock. MFC after: 11 days
|
#
e83177fb |
|
02-May-2017 |
Alexander Motin <mav@FreeBSD.org> |
Fix r317696 build without debug. MFC after: 2 weeks
|
#
2f86d4b0 |
|
02-May-2017 |
Alexander Motin <mav@FreeBSD.org> |
Introduce sleepable locks into if_lagg. Before this change if_lagg was using nonsleepable rmlocks to protect its internal state. This patch introduces another sx lock to protect code paths that require sleeping, while still uses old rmlock to protect hot nonsleepable data paths. This change allows to remove taskqueue decoupling used before to change interface addresses without holding the lock. Instead it uses sx lock to protect direct if_ioctl() calls. As another bonus, the new code synchronizes enabled capabilities of member interfaces, and allows to control them with ifconfig laggX, that was impossible before. This part should fix interoperation with if_bridge, that may need to disable some capabilities, such as TXCSUM or LRO, to allow bridging with noncapable interfaces. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D10514
|
#
1e04441a |
|
22-Apr-2017 |
Alexander Motin <mav@FreeBSD.org> |
Remove unneeded conditions. MFC after: 2 weeks
|
#
b98b5ae8 |
|
21-Apr-2017 |
Alexander Motin <mav@FreeBSD.org> |
Add interface reference counting to if_lagg. Using plain ifunit() looks like request for troubles. MFC after: 2 weeks
|
#
13157b2b |
|
29-Jan-2017 |
Luiz Otavio O Souza <loos@FreeBSD.org> |
Do not update the lagg link layer address when destroying a lagg clone. This would enqueue an event to send the gratuitous arp on a dying lagg interface without any physical ports attached to it. Apart from that, the taskqueue_drain() on lagg_clone_destroy() runs too late, when the ifp data structure is already freed. Fix that too. Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)
|
#
f3e7afe2 |
|
18-Jan-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months
|
#
8a73c85d |
|
20-Dec-2016 |
Alan Somers <asomers@FreeBSD.org> |
Remove stray debugging code from r310180 Reported by: rstone Pointy hat to: asomers MFC after: 3 weeks X-MFC-with: 310180 Sponsored by: Spectra Logic Corp
|
#
d9fa2d67 |
|
16-Dec-2016 |
Alan Somers <asomers@FreeBSD.org> |
Fix panic during lagg destruction with simultaneous status check If you run "ifconfig lagg0 destroy" and "ifconfig lagg0" at the same time a page fault may result. The first process will destroy ifp->if_lagg in lagg_clone_destroy (called by if_clone_destroy). Then the second process will observe that ifp->if_lagg is NULL at the top of lagg_port_ioctl and goto fallback: where it will promptly dereference ifp->if_lagg anyway. The solution is to repeat the NULL check for ifp->if_lagg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D8512
|
#
89856f7e |
|
21-Jun-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747
|
#
a4641f4e |
|
03-May-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/net*: minor spelling fixes. No functional change.
|
#
729a4cff |
|
05-Apr-2016 |
Ravi Pokala <rpokala@FreeBSD.org> |
Revert accidental submit of WIP as part of r297609 Pointyhat to: rpokala
|
#
06152bf0 |
|
05-Apr-2016 |
Ravi Pokala <rpokala@FreeBSD.org> |
Storage Controller Interface driver - typo in unimplemented macro in scic_sds_controller_registers.h s/contoller/controller/ PR: 207336 Submitted by: Tony Narlock <tony @ git-pull.com>
|
#
d931334b |
|
18-Feb-2016 |
Marcelo Araujo <araujo@FreeBSD.org> |
Fix regression introduced on 272446r. lagg(4) supports the protocol none, where it disables any traffic without disabling the lagg(4) interface itself. PR: 206921 Submitted by: Pushkar Kothavade <pushkarbk@gmail.com> Reviewed by: rpokala Approved by: bapt (mentor) MFC after: 3 weeks Sponsored by: gandi.net Differential Revision: https://reviews.freebsd.org/D5076
|
#
d62edc5e |
|
22-Jan-2016 |
Marcelo Araujo <araujo@FreeBSD.org> |
Add an IOCTL rr_limit to let users fine tuning the number of packets to be sent using roundrobin protocol and set a better granularity and distribution among the interfaces. Tuning the number of packages sent by interface can increase throughput and reduce unordered packets as well as reduce SACK. Example of usage: # ifconfig bge0 up # ifconfig bge1 up # ifconfig lagg0 create # ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \ 192.168.1.1 netmask 255.255.255.0 # ifconfig lagg0 rr_limit 500 Reviewed by: thompsa, glebius, adrian (old patch) Approved by: bapt (mentor) Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D540
|
#
d6e82913 |
|
17-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay
|
#
52e53e2d |
|
15-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111
|
#
8ad43f2d |
|
14-Nov-2015 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Move iflladdr_event eventhandler invocation to if_setlladdr. Suggested by: glebius
|
#
bb3d23fd |
|
01-Nov-2015 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Fix lladdr change propagation for on vlans on top of it. Fix lladdr update when setting mac address manually. Fix lladdr_event for slave ports addition. MFC after: 4 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D4004
|
#
b7a581ea |
|
15-Oct-2015 |
Hiroki Sato <hrs@FreeBSD.org> |
Fix a panic when destroying a lagg interface. Differential Revision: https://reviews.freebsd.org/D3883
|
#
023d10cb |
|
07-Oct-2015 |
Hiroki Sato <hrs@FreeBSD.org> |
Fix a bug that caused reinitialization failure of MAC addresses on the lagg interface when removing the primary port. PR: 201916 Differential Revision: https://reviews.freebsd.org/D3301
|
#
973532fc |
|
04-Oct-2015 |
Marcelo Araujo <araujo@FreeBSD.org> |
Remove per complete the fec aggregation protocol. The remove began with revision r271733. NOTE: This patch must never be merge to 10-Stable Reviewed by: glebius Approved by: bapt (mentor) Relnotes: Yes Sponsored by: EuroBSDCon Sweden. Differential Revision: D3786
|
#
0e02b43a |
|
12-Aug-2015 |
Hiren Panchasara <hiren@FreeBSD.org> |
Make LAG LACP fast timeout tunable through IOCTL. Differential Revision: D3300 Submitted by: LN Sundararajan <lakshmi.n at msystechnologies> Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas MFC after: 2 weeks Sponsored by: Panasas
|
#
a4b65afc |
|
26-Mar-2015 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Fix a possible mbuf leak on interface departure. Reported by: Alexandre Martins
|
#
b7ba031f |
|
11-Mar-2015 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Factor out mbuf hashing code from LAGG driver so that other network drivers can use it. This avoids some code duplication. Add missing default case to all switch statements while at it. Also move the hashing of the IPv6 flow field to layer 4 because the IPv6 flow field is constant on a per L4 connection basis and not on a per L3 network. Differential Revision: https://reviews.freebsd.org/D1987 Sponsored by: Mellanox Technologies MFC after: 1 month
|
#
0e5f55bb |
|
22-Jan-2015 |
Will Andrews <will@FreeBSD.org> |
Improve the distribution of LAGG port traffic. I edited the original change to retain the use of arc4random() as a seed for the hashing as a very basic defense against intentional lagg port selection. The author's original commit message (edited slightly): sys/net/ieee8023ad_lacp.c sys/net/if_lagg.c In lagg_hashmbuf, use the FNV hash instead of the old hash32_buf. The hash32 family of functions operate one octet at a time, and when run on a string s of length n, their output is equivalent to : ----- i=n-1 \ n \ (n-i-1) 32 ( seed^ + / 33^ * s[i] ) % 2^ / ----- i=0 The problem is that the last five bytes of input don't get multiplied by sufficiently many powers of 33 to rollover 2^32. That means that changing the last few bytes (but obviously not the very last) of input will always change the value of the hash by a multiple of 33. In the case of lagg_hashmbuf() with ipv4 input, the last four bytes are the TCP or UDP port numbers. Since the output of lagg_hashmbuf is always taken modulo the port count, and 3 is a common port count for a lagg, that's bad. It means that the UDP or TCP source port will never affect which lagg member is selected on a 3-port lagg. At 10Gbps, I was not able to measure any difference in CPU consumption between the old and new hash. Submitted by: asomers (original commit) Reviewed by: emaste, glebius MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1001723 on 2013/08/28 (original) 1114258 on 2015/01/22 (edit)
|
#
504289ea |
|
17-Jan-2015 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Fix condition and really sort ports. Also add comment describing the intent of this code. Reported by: sbruno MFC after: 1 week Sponsored by: Yandex LLC
|
#
c2529042 |
|
01-Dec-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies
|
#
033074c4 |
|
09-Nov-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Replace 'struct route *' if_output() argument with 'struct nhop_info *'. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.
|
#
bf6d3f0c |
|
17-Oct-2014 |
Hiroki Sato <hrs@FreeBSD.org> |
- Fix lladdr configuration which could prevent LACP mode from working. - Fix LORs when a laggport interface has an IPv6 LLA. PR: 194321
|
#
6d478167 |
|
04-Oct-2014 |
Hiroki Sato <hrs@FreeBSD.org> |
- Move L2 addr configuration for the primary port to a taskqueue. This fixes LOR of softc rmlock in iflladdr_event handlers. - Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR. - Fix a panic in lacp_transit_expire(). - Fix a panic in lagg_input() upon shutting down a port.
|
#
9732189c |
|
02-Oct-2014 |
Hiroki Sato <hrs@FreeBSD.org> |
Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for backward compatibility with old ifconfig(8).
|
#
939a050a |
|
01-Oct-2014 |
Hiroki Sato <hrs@FreeBSD.org> |
Virtualize lagg(4) cloner. This change fixes a panic when tearing down if_lagg(4) interfaces which were cloned in a vnet jail. Sysctl nodes which are dynamically generated for each cloned interface (net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift ifconfig(8) parameters have been added instead. Flags and per-interface statistics counters are displayed in "ifconfig -v". CR: D842
|
#
dee826ce |
|
01-Oct-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Fix off by one in lagg_port_destroy(). Reported by: "Max N. Boyarov" <zotrix bsd.by>
|
#
112f50ff |
|
28-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Finally, convert counters in struct ifnet to counter(9). Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
23575437 |
|
28-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Convert to if_inc_counter() last remnantes of bare access to struct ifnet counters.
|
#
7d6cc45c |
|
27-Sep-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Use underlying ports counters to get lagg statistics instead of per-packet accounting. This introduce user-visible changes like aggregating error counters. Reviewed by: asomers (prev.version), glebius CR: D781 MFC after: 2 weeks Sponsored by: Yandex LLC
|
#
eade13f9 |
|
26-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove macros that hide access to struct ifnet fields.
|
#
38738d73 |
|
25-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make all lagg protocol methods live in lagg_protos, not in softc. All interfaces of a same protocol, use the same methods. Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
30e5de48 |
|
25-Sep-2014 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Keep list of lagg ports sorted by if_index. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC
|
#
6900d0d3 |
|
25-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
- Whitespace. - Remove caddr_t.
|
#
16ca790e |
|
26-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
- Provide lagg_proto_attach(), lagg_proto_detach(). - Make detach a protocol method in lagg_protos. - Simplify code to lookup protocols. Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
09c7577e |
|
26-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
- When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE, then drop lock, run the attach routines, and then set it to specific proto. This removes tons of WITNESS warnings. - Make lagg protocol attach handlers not failing and allocate memory with M_WAITOK. Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
b1bbc5b3 |
|
26-Sep-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make lagg protocols detach methods returning void. Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
9fd573c3 |
|
22-Sep-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week
|
#
99cdd961 |
|
17-Sep-2014 |
Marcelo Araujo <araujo@FreeBSD.org> |
Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group and receives frames on any port of the lagg(4). Phabric: D549 Reviewed by: glebius, thompsa Approved by: glebius Obtained from: OpenBSD Sponsored by: QNAP Systems Inc.
|
#
72f31000 |
|
13-Sep-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Revert r271504. A new patch to solve this issue will be made. Suggested by: adrian @
|
#
eb93b77a |
|
13-Sep-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies
|
#
13399157 |
|
10-Aug-2014 |
Marcelo Araujo <araujo@FreeBSD.org> |
- Remove unneeded include. Phabric: D563 Reviewed by: kevlo Approved by: kevlo
|
#
2d222cb7 |
|
03-Aug-2014 |
Alexander Motin <mav@FreeBSD.org> |
Improve locking of multicast addresses in VLAN and LAGG interfaces. This fixes several scenarios of reproducible panics, cause by races between multicast address changes and interface destruction. MFC after: 2 weeks
|
#
af3b2549 |
|
27-Jun-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Pull in r267961 and r267973 again. Fix for issues reported will follow.
|
#
37a107a4 |
|
27-Jun-2014 |
Glen Barber <gjb@FreeBSD.org> |
Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
|
#
3da1cf1e |
|
27-Jun-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
|
#
d092e11c |
|
15-Apr-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix build for non-INET that was broken by r264469. MFC after: 2 weeks
|
#
9387570f |
|
14-Apr-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
Lagg did not set the value of if_hw_tsomax, so when lagg was stacked on top of network interfaces that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface(s). This patch modifies lagg so that it sets if_hw_tsomax to the minimum of the value(s) for the underlying network interfaces. Reviewed by: glebius MFC after: 2 weeks
|
#
95fbe4d0 |
|
18-Jan-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Simplify filling sockaddr_dl structure for if_resolvemulti() callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers). MFC after: 2 weeks
|
#
1a8959da |
|
29-Dec-2013 |
Scott Long <scottl@FreeBSD.org> |
Multi-queue NIC drivers and multi-port lagg tend to use the same lower bits of the flowid as each other, resulting in a poor distribution of packets among queues in certain cases. Work around this by adding a set of sysctls for controlling a bit-shift on the flowid when doing multi-port aggrigation in lagg and lacp. By default, lagg/lacp will now use bits 16 and higher instead of 0 and higher. Reviewed by: max Obtained from: Netflix MFC after: 3 days
|
#
4cdc1f54 |
|
09-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
There are some high performance NICs that count statistics in hardware, and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)
|
#
310915a4 |
|
29-Aug-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Convert the if_lagg rwlock to an rmlock. We've been seeing lots of cache line contention (but not lock contention!) in our workloads between the various TX and RX threads going on. The write lock is only grabbed when configuration changes are made - which are infrequent. With this patch, the contention and cycles spent waiting for updates disappear. Sponsored by: Netflix, Inc.
|
#
49de4f22 |
|
26-Jul-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Break out the static, global LACP debug options into a per-lagg unit sysctl tree. * Create a net.link.lagg.X.lacp node * Add a debug node under that for tx_test and rx_test * Add lacp_strict_mode, defaulting to 1 tx_test and rx_test are still a bitmap of unit numbers for now. At some point it would be nice to create child nodes of the lagg bundle for each sub-interface, and then populate those with various knobs and statistics. Sponsored by: Netflix
|
#
31402c27 |
|
12-Jul-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Bring over some link aggregation / LACP protocol improvements and debugging additions. * Add some new tracing events to aid in debugging. * Add in a debugging mode to drop transmit and received frames, specifically to test whether seeing or hearing heartbeats correctly cause LACP to drop the port. * Add in (and make default) a strict LACP mode, which requires the heartbeat on a port to be heard before it's used. Sometimes vendor ports will hang but the link layer stays up, resulting in hung traffic. * Add logging the number of link status flaps, again to aid in debugging badly behaving switch ports. * Calculate the lagg interface port speed as the multiple of the configured ports, rather than the largest. Obtained from: Netflix MFC after: 2 weeks
|
#
af805644 |
|
02-Jul-2013 |
Hiroki Sato <hrs@FreeBSD.org> |
- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal. To configure an autoconfigured link-local address (RFC 4862), the following rc.conf(5) configuration can be used: ifconfig_bridge0_ipv6="inet6 auto_linklocal" - if_bridge(4) now removes IPv6 addresses on a member interface to be added when the parent interface or one of the existing member interfaces has an IPv6 address. if_bridge(4) merges each link-local scope zone which the member interfaces form respectively, so it causes address scope violation. Removal of the IPv6 addresses prevents it. - if_lagg(4) now removes IPv6 addresses on a member interfaces unconditionally. - Set reasonable flags to non-IPv6-capable interfaces. [*] Submitted by: rpaulo [*] MFC after: 1 week
|
#
eda6cf02 |
|
17-Jun-2013 |
Xin LI <delphij@FreeBSD.org> |
Return ENETDOWN instead of ENOENT when all lagg(4) links are inactive when upper layer tries to transmit packet. This gives better feedback and meaningful errors for applications. MFC after: 2 weeks Reviewed by: thompsa
|
#
f8afe337 |
|
07-Jun-2013 |
Mikolaj Golub <trociny@FreeBSD.org> |
Properly set curvnet context in lagg_port_setlladdr() task handler. Reported by: Nikos Vassiliadis <nvass gmx.com> Submitted by: zec Tested by: Nikos Vassiliadis <nvass gmx.com> MFC after: 1 week
|
#
47e8d432 |
|
25-Apr-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add const qualifier to the dst parameter of the ifnet if_output method.
|
#
b64478a1 |
|
15-Apr-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Switch lagg(4) statistics to counter(9). The lagg(4) is often used to bond high speed links, so basic per-packet += on statistics cause cache misses and statistics loss. Perfect solution would be to convert ifnet(9) to counters(9), but this requires much more work, and unfortunately ABI change, so temporarily patch lagg(4) manually. We store counters in the softc, and once per second push their values to legacy ifnet counters. Sponsored by: Nginx, Inc.
|
#
209dddb9 |
|
22-Mar-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove __FreeBSD_version ifdefs.
|
#
1d9797f1 |
|
21-Jan-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
If lagg(4) can't forward a packet due to underlying port problems, return much more meaningful ENETDOWN to the stack, instead of EBUSY.
|
#
6684e469 |
|
17-Oct-2012 |
Xin LI <delphij@FreeBSD.org> |
Fix build.
|
#
e9bbb44e |
|
16-Oct-2012 |
Maksim Yevmenkin <emax@FreeBSD.org> |
report total number of ports for each lagg(4) interface via net.link.lagg.X.count sysctl MFC after: 1 week
|
#
42a58907 |
|
16-Oct-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make the "struct if_clone" opaque to users of the cloning API. Users now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago
|
#
9823d527 |
|
10-Oct-2012 |
Kevin Lo <kevlo@FreeBSD.org> |
Revert previous commit... Pointyhat to: kevlo (myself)
|
#
a10cee30 |
|
09-Oct-2012 |
Kevin Lo <kevlo@FreeBSD.org> |
Prefer NULL over 0 for pointers
|
#
3b7d677b |
|
20-Sep-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Convert lagg(4) to use if_transmit instead of if_start. In collaboration with: thompsa, sbruno, fabient
|
#
61587a84 |
|
30-Jun-2012 |
Andrew Thompson <thompsa@FreeBSD.org> |
Add the same check as vlan(4) where we ignore the ifnet departure event if the interface is just being renamed. PR: kern/169557 Submitted by: Mark Johnston MFC after: 3 days
|
#
f74d5a7a |
|
27-May-2012 |
Eygene Ryabinkin <rea@FreeBSD.org> |
if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row Currently, 'ifconfig laggX down' does not remove members from this lagg(4) interface. So, 'service netif stop laggX' followed by 'service netif start laggX' will choke, because "stop" will leave interfaces attached to the laggX and ifconfig from the "start" will refuse to add already-existing interfaces. The real-world case is when I am bundling together my Ethernet and WiFi interfaces and using multiple profiles for accessing network in different places: system being booted up with one profile, but later this profile being exchanged to another one, followed by 'service netif restart' will not add WiFi interface back to the lagg: the "stop" action from 'service netif restart' will shut down my main WiFi interface, so wlan0 that exists in the lagg0 will be destroyed and purged from lagg0; the "start" action will try to re-add both interfaces, but since Ethernet one is already in lagg0, ifconfig will refuse to add the wlan0 from WiFi interface. Since adding the interface to the lagg(4) when it is already here should be an idempotent action: we're really not changing anything, so this fix doesn't change the semantics of interface addition. Approved by: thompsa Reviewed by: emaste MFC after: 1 week
|
#
6107adc3 |
|
02-May-2012 |
Ed Maste <emaste@FreeBSD.org> |
Relax restriction on direct tx to child ports Lagg(4) restricts the type of packet that may be sent directly to a child port, to avoid undesired output from accidental misconfiguration. Previously only ETHERTYPE_PAE was permitted. BPF writes to a lagg(4) child port are presumably intentional, so just allow them, while still blocking other packets that should take the aggregation path. PR: kern/138620 Approved by: thompsa@
|
#
b517176a |
|
11-Apr-2012 |
Andrew Thompson <thompsa@FreeBSD.org> |
Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets are discarded, this is an issue because lacp drops the lock which may allow network threads to access freed memory. Expand the lock coverage so the detach/attach happen atomically. Submitted by: Andrew Boyer (earlier version)
|
#
cd613b63 |
|
07-Mar-2012 |
Andrew Thompson <thompsa@FreeBSD.org> |
Move the vlan buffer space into the union which also fixes an unused variable warning with !INET & !INET6. Spotted by: pluknet
|
#
86f67641 |
|
06-Mar-2012 |
Andrew Thompson <thompsa@FreeBSD.org> |
Add the ability to set which packet layers are used for the load balance hash calculation.
|
#
3122b912 |
|
23-Feb-2012 |
Andrew Thompson <thompsa@FreeBSD.org> |
Add a sysctl/tunable default value for the use_flowid sysctl in r232008.
|
#
0bf97ae2 |
|
22-Feb-2012 |
Andrew Thompson <thompsa@FreeBSD.org> |
Using the flowid in the mbuf assumes the network card is giving a good hash for the traffic flow, this may not be the case giving poor traffic distribution. Add a sysctl which allows us to fall back to our own flow hash code. PR: kern/164901 Submitted by: Eugene Grosbein MFC after: 1 week
|
#
4b22573a |
|
11-Nov-2011 |
Brooks Davis <brooks@FreeBSD.org> |
In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month
|
#
6472ac3d |
|
07-Nov-2011 |
Ed Schouten <ed@FreeBSD.org> |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
#
c94a66f8 |
|
01-Aug-2011 |
Sergey Kandaurov <pluknet@FreeBSD.org> |
Add missing MODULE_VERSION() definition to protect against duplicating module loads. PR: kern/159345 Reported by: Eugene Grosbein <egrosbein att rdtc ru> Tested by: Eugene Grosbein <egrosbein att rdtc ru> Approved by: re (kib) MFC after: 1 week
|
#
6069a2c0 |
|
07-Jul-2011 |
Andrew Thompson <thompsa@FreeBSD.org> |
Grab the rlock before checking if our interface is enabled, it could be possible to hit a dead pointer when changing interfaces. PR: kern/156978 Submitted by: Andrew Boyer MFC after: 1 week
|
#
627cecc5 |
|
30-Apr-2011 |
Andrew Thompson <thompsa@FreeBSD.org> |
LACP frames must not be send VLAN-tagged, check for that before processing. PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week
|
#
a0ae8f04 |
|
27-Apr-2011 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
#
5f82cfdf |
|
04-Mar-2011 |
Ermal Luçi <eri@FreeBSD.org> |
Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none. Approved by: thompsa(mentor) MFC after: 1 week
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
be4572c8 |
|
01-Sep-2010 |
Ed Maste <emaste@FreeBSD.org> |
Add a sysctl knob to accept input packets on any link in a failover lagg.
|
#
7c61d493 |
|
24-May-2010 |
Andrew Thompson <thompsa@FreeBSD.org> |
MFC r202588 Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev MFC r202611 Do not hold the lock over if_setlladdr() as it calls into the interface driver init routine.
|
#
e546195f |
|
07-Apr-2010 |
Xin LI <delphij@FreeBSD.org> |
MFC r204901 Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa
|
#
13d85d43 |
|
08-Mar-2010 |
Xin LI <delphij@FreeBSD.org> |
Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month
|
#
644da90d |
|
06-Feb-2010 |
Ermal Luçi <eri@FreeBSD.org> |
Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features. PR: kern/141646 Reviewed by: thompsa Approved by: thompsa(co-mentor) MFC after: 2 weeks
|
#
ea4ca115 |
|
18-Jan-2010 |
Andrew Thompson <thompsa@FreeBSD.org> |
Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev
|
#
22133b44 |
|
08-Jan-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Stop GCC from complaining about lagg_port_checkstacking() being unused.
|
#
5c6026e9 |
|
30-Apr-2009 |
Andrew Thompson <thompsa@FreeBSD.org> |
Use the flowid if its available for selecting the tx port.
|
#
279aa3d4 |
|
16-Apr-2009 |
Kip Macy <kmacy@FreeBSD.org> |
Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson
|
#
f812e067 |
|
17-Dec-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
- Protect against sc->sc_primary being null - Initialise speed where its used
|
#
be07c180 |
|
17-Dec-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
Update the interface baudrate taking into account the max speed for the different aggregation protocols.
|
#
09efca80 |
|
16-Dec-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
Also propagate the if_hwassist value to the parent so that cksum offload works. Submitted by: Tom Hicks (thicks_averesys.com)
|
#
aea78d20 |
|
22-Nov-2008 |
Kip Macy <kmacy@FreeBSD.org> |
convert calls to IFQ_HANDOFF to if_transmit
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
8e46f311 |
|
30-Sep-2008 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Do not mangle if_oerrors of the underlying interface. This counter belongs solely to the driver. We don't lose any statistics with this change, because in a error case the drop counter on the interface output queue is always incremented. Reviewed by: thompsa
|
#
149bac03 |
|
18-Sep-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
Move the protocol and port count checks to outside the loop, these conditions can not change while we have the lock so no point retesting.
|
#
96c41c08 |
|
17-Sep-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
Make sure there is at least one port to avoid divide by zero when choosing the tx port. PR: kern/122794 MFC after: 3 days
|
#
6729225f |
|
03-Jul-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
port % count will never be greater than LAGG_MAX_PORTS so nuke the test.
|
#
3de18008 |
|
16-Mar-2008 |
Andrew Thompson <thompsa@FreeBSD.org> |
Switch the LACP state machine over to its own mutex to protect the internals, this means that it no longer grabs the lagg rwlock. Use two port table arrays which list the active ports for Tx and switch between them with an atomic op. Now the lagg rwlock is only exclusively locked for management (ioctls) and queuing of lacp control frames isnt needed.
|
#
af0084c9 |
|
30-Dec-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Pass any unmatched slowprotocols frames up the stack instead of dropping them, there are more subtypes than just LACP.
|
#
1f019d83 |
|
17-Dec-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
- Use the macro to check the port status has it will also test if its administratively down (!IFF_UP) - Use the same parameters to lagg_link_active() to get the backup port as in the output path, this didnt actually matter in practice as sc_primary is always the first on the port list. MFC after: 3 days
|
#
f51133ee |
|
17-Dec-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Add myself to the copyright.
|
#
d3b28963 |
|
04-Dec-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Support monitor mode where the frame is discarded after bpf and stats processing.
|
#
80ddfb40 |
|
24-Nov-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Have the lagg interface generate link up/down events, the interface is marked as up if at least one of its ports also has a link up. This fixes using carp+lagg together and any other system that relies on linkstate events. PR: kern/113956 MFC after: 3 days
|
#
544f7141 |
|
19-Oct-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Use ETHER_BPF_MTAP so that the vlan tags are visible to bpf(4) when stacked under a vlan. MFC after: 3 days
|
#
960dab09 |
|
11-Oct-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Fix two panics in lagg. 1. The locking was changed to shared but roundrobin mode still updated a pointer in the softc with the next tx interface to use. This will panic under high load. Change this to an atomically incremented sequence number in order to choose the tx port in round robin. 2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed again by lagg_start() and panic. Reorganised the error handling and freeing to fix this. MFC after: 3 days
|
#
20745551 |
|
30-Aug-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Show the ACTIVE flag in ifconfig for the single interface that is actaully active in failover mode rather than all interfaces with a link. This makes it clear if the master interface is in use or one of the backup links. Found by: Writing the Handbook section Approved by: re (kensmith)
|
#
de75afe6 |
|
30-Jul-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
- Propagate the largest set of interface capabilities supported by all lagg ports to the lagg interface. - Use the MTU from the first interface as the lagg MTU, all extra interfaces must be the same. This fixes using a lagg interface for a vlan or enabling jumbo frames, etc. Approved by: re (kensmith) MFC After: 3 days
|
#
82056f42 |
|
26-Jul-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Avoid holding the softc lock when using copyout(). Reported by: dfr Approved by: re (rwatson)
|
#
b3d37ca5 |
|
05-Jul-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Allow the LACP state to be queried from userland which at the moment is the actor and partner peer info. Print out the active aggregator and per port data in verbose mode from ifconfig. Approved by: re (mux)
|
#
ec32b37e |
|
12-Jun-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
non-functional cleanup - remove dead code - use consistent variable names - gc unused defines - whitespace cleanup
|
#
6469e186 |
|
19-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
- packets on the input interface were counted twice - Use IFQ_HANDOFF instead of rolling our own
|
#
9bbba41e |
|
18-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Fix a mbuf leak where sc_start fails or the protocol is none.
|
#
3362a474 |
|
18-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Fix locking assert where we should hold the reader lock.
|
#
e2a77bb8 |
|
15-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Fix unused variable error with !INET6 Reported by: Artem Naluzhny, Frank Terhaar-Yonkers
|
#
7a04b0f6 |
|
15-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Feed ipv6 flowlabel to hash calculation. Obtained from: NetBSD
|
#
3bf517e3 |
|
15-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Change from a mutex to a read/write lock. This allows the tx port to be selected simultaneously by multiple senders and transmit/receive is not serialised between aggregated interfaces.
|
#
a5715cb2 |
|
07-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
- Correctly check if lp_ioctl is null - Remove lagg_ether_purgemulti as its no longer needed - Mark the interface as up if any ports are active rather than just the primary
|
#
efcd0965 |
|
06-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
The purgemulti call is not needed since all the ports have already been detached.
|
#
cdc6f95f |
|
06-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Call if_setlladdr() on the aggregation port from a taskqueue so the softc lock is not held. The short delay between aggregating the port and setting the MAC address is fine.
|
#
108fe96a |
|
06-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Avoid touching various unsafe parts if the interface is disappearing.
|
#
d74fd345 |
|
06-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Change from using if_delmulti() to if_delmulti_ifma() as it simplifies the code and is safe to use if the ifp has disappeared. Suggested by: bms
|
#
e3163ef6 |
|
03-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
- Add a disabled state for ports that can not be aggregated - Refine check for lacp links, set to disabled if not suitable
|
#
139722d4 |
|
02-May-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Set the master flag on the right variable.
|
#
18242d3b |
|
16-Apr-2007 |
Andrew Thompson <thompsa@FreeBSD.org> |
Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking. The name trunk is misused as the networking term trunk means carrying multiple VLANs over a single connection. The IEEE standard for link aggregation (802.3 section 3) does not talk about 'trunk' at all while it is used throughout IEEE 802.1Q in describing vlans. The lagg(4) driver provides link aggregation, failover and fault tolerance. Discussed on: current@
|