Cross Reference: /freebsd-current/sys/net/if

History log of /freebsd-current/sys/net/if_var.h
Revision	Date	Author	Comments
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 2a371643	21-Jul-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Retire if_etherbpfmtap() and if_bpfmtap() Summary: These came in the original DrvAPI commits in 2014, and are obsoleted by bpf_mtap_if() and ether_bpf_mtap_if(). The `_if` suffix, rather than prefix, conveys that it's operating on the bpf of the interface, instead than the interface itself. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D41146
# 2ff63af9	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s\+\s\$FreeBSD\$.$\n/
# 79379355	16-Jun-2023	Alexander V. Chernikov <melifaro@FreeBSD.org>	netlink: convert to IfAPI. Convert to IfAPI everything except `IF_AFDATA_WLOCK` usage in neigh.c. Reviewed By: jhibbits Differential Revision: https://reviews.freebsd.org/D40577
# 848d5bb1	16-May-2023	Konstantin Belousov <kib@FreeBSD.org>	net/if_var.h: consistently use if_t over struct ifnet * Reviewed by: jhibbits Sponsored by: NVidia networking Differential revision: https://reviews.freebsd.org/D40125
# f766d1d5	10-Apr-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add if_maddr_empty() to check for any maddrs if_llmaddr_count() only counts link-level multicast addresses. hv_netvsc(4) needs to know if there are any multicast addresses. Since hv_netvsc(4) is the only instance where this would be used, make it a simple boolean. If others need a if_maddr_count(), that can be added in the future. Reviewed by: melifaro Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D39493
# 7814374b	07-Apr-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Hide the macros that touch ifnet members Nothing should be directly touching the ifnet members, which are hidden in <net/if_private.h>, so hide them in the same header to avoid errors from users. Sponsored by: Juniper Networks, Inc.
# 56d4550c	19-Apr-2023	Alexander V. Chernikov <melifaro@FreeBSD.org>	ifnet: factor out interface renaming into a separate function. This change is required to support interface renaming via Netlink. No functional changes intended. Reviewed by: zlei Differential Revision: https://reviews.freebsd.org/D39692 MFC after: 2 weeks
# e2427c69	16-Mar-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add iterator to complement if_foreach() Summary: Sometimes an if_foreach() callback can be trivial, or need a lot of outer context. In this case a regular `for` loop makes more sense. To keep things hidden in the new API, use an opaque `if_iter` structure that can still be instantiated on the stack. The current implementation uses just a single pointer out of the 4 alotted to the opaque context, and the cleanup does nothing, but may be used in the future. Reviewed by: melifaro Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D39138
# df2b419a	04-Mar-2023	Alexander V. Chernikov <melifaro@FreeBSD.org>	ifnet: add if_foreach_sleep() to allow ifnet iterations with sleep. Subscribers: imp, ae, glebius Differential Revision: https://reviews.freebsd.org/D38904
# 66bdbcd5	03-Mar-2023	Alexander V. Chernikov <melifaro@FreeBSD.org>	net: unify mtu update code Subscribers: imp, ae, glebius Differential Revision: https://reviews.freebsd.org/D38893
# aac2d19d	10-Feb-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Style cleanup Summary: Clean up style issues from IfAPI additions. Casts to (struct ifnet ) made sense when `if_t` was a `void `, but since it's a `struct ifnet *` it no longer makes sense. Fix whitespace errors, among others. Reviewed by: kib, glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38499
# a3a76c3d	10-Feb-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add capabilities2/capenable2 accessors Summary: As a stopgap measure add basic accessors for the if_capabilities2 and if_capenable2 members to further hide the ifnet details. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, kib Differential Revision: https://reviews.freebsd.org/D38487
# 189c3729	10-Feb-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: More accessors Summary: Add the following accessors needed by infiniband drivers: * if_getaddrlen() * if_setbroadcastaddr() * if_resolvemulti() With these accessors, and additional changes on the drivers' side, an amd64 kernel can be compiled with `struct ifnet` completely hidden. Reviewed by: melifaro Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38488
# 1e6131ba	01-Feb-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add needed APIs for mbuf support Summary: Add 2 new APIs for supporting recent mbuf changes: * 36e0a362ac added the m_snd_tag_alloc() wrapper around if_snd_tag_alloc(). Push this down to the ifnet level. * 4d7a1361ef adds the m_rcvif_serialize()/m_rcvif_restore() KPIs to serialize and restore an ifnet pointer. Add the necessary wrapper to get the index generation for this. Reviewed By: jhb Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38340
# 2eeb8083	01-Feb-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add iterator to loop over all interfaces Summary: Sometimes it's useful to iterate over all interfaces in the current VNET, as the linuxulator does in several places. Unlike other iterators in the IfAPI this propagates any error received up to the caller, instead of returning a count. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38348
# d79539e6	25-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add if_altq_is_enabled() interface. Summary: The only user of the ALTQ_IS_ENABLED() in a driver checks against the ifnet queue. Abstract that all out and present the interface to check if ALTQ is enabled on the interface. Sponsored by: Juniper Networks, Inc. Reviewed By: glebius Differential Revision: https://reviews.freebsd.org/D38204
# 31cfaf19	25-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add l2com accessor for firewire. Summary: Firewire is the only device driver that accesses the l2com member, all other accesses are handled within the netstack itself. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38203
# 0d2684e1	24-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add some more accessors Summary: * if_setreassignfn for wireguard. * if_getinputfn() and if_getstartfn() for various drivers. Use the function descriptor typedefs for these and the setters. * vlantrunk accessor. This is used by VLAN_CAPABILITIES() used by several drivers, as well as directly by mxge(4). * if_pcp member accessor, used by cxgbe. * accessors for netmap adapter. Sponsored by: Juniper Networks, Inc. Reviewed By: glebius Differential Revision: https://reviews.freebsd.org/D38202
# c255d1a4	23-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add if_llsoftc member accessors for TOEDEV Summary: Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV() macro to complement with the new accessors. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D38199
# 30af2c13	23-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add if_get/setmaclabel() and use it. Summary: Port the MAC modules to use the IfAPI APIs as part of this. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D38197
# 113af4fd	24-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Add if_gettype() API and use it for vlan Sponsored by: Juniper Networks, Inc. Reviewed by: #network, glebius Differential Revision: https://reviews.freebsd.org/D38198
# 053a24d1	13-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	debugnet: Add ifnet accessor to set debugnet methods As part of the effort to hide the internals of the ifnet struct, convert the DEBUGNET_SET() macro to use an accessor instead of directly touching the methods member. Reviewed by: glebius (older version) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38105
# 2c2b37ad	13-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	ifnet/API: Move struct ifnet definition to a <net/if_private.h> Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
# fa25dbfd	12-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	ifnet API: Change if_init() to take context argument Some drivers, like iflib drivers, take a 'context' argument instead of a ifnet argument, as a single interface may have multiple contexts. Follow this scheme by passing the context argument down. Most drivers will likely pass 'ifp' as the context. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38102
# ae330108	12-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	Revert "ifnet/API: Move the IfAPI from if_var.h to if.h" <net/if.h> should be a fully user-facing header, so these APIs don't belong there. Revert and will find another approach. This reverts commit fe33e0ab83d1fbc3c5cd4a2591ba0036e47b1fec. Fixes: fe33e0ab83d1 Sponsored by: Juniper Networks, Inc.
# fe33e0ab	11-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	ifnet/API: Move the IfAPI from if_var.h to if.h Summary: The "public" KPI for ifnet belongs in net/if.h, with net/if_var.h being implementation details for the netstack. This is the next step in enforcing that separation. Reviewed by: melifaro Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38030
# be4315dc	21-Dec-2022	Justin Hibbits <jhibbits@FreeBSD.org>	ifnet/DrvAPI: Move if_t typedef to a better place Summary: <net/if_var.h> should really be used by the netstack only, not by drivers. Eventually all the accessors will be moved to <net/if.h> as well, but for now just move the typedef while the KPI gets sorted and drivers get converted. Sponsored by: Juniper Networks, Inc. Reviewed By: melifaro, glebius Differential Revision: https://reviews.freebsd.org/D37784
# eb1da3e5	09-Dec-2022	Justin Hibbits <jhibbits@FreeBSD.org>	DrvAPI: Extend driver KPI with more accessors Summary: Add the following accessors to hide some more netstack details: * if_get/setcapabilities2 and bits analogue if_setdname * if_getxname * if_transmit - wrapper for call to ifp->if_transmit() - This required changing the existing if_transmit to if_transmit_default, since that's its purpose. * if_getalloctype * if_getindex * if_foreach_addr_type - Like if_foreach_lladdr() but for any address family type. Used by some drivers to iterate over all AF_INET addresses. * if_init() - wrapper for ifp->if_init() call * if_setinputfn * if_setsndtagallocfn * if_togglehwassist Reviewers: #transport, #network, glebius, melifaro Reviewed by: #network, melifaro Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D37664
# 984b27d8	30-Nov-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	net: add if_allocdescr() to permit updating iface description from the kernel Reviewed by: kp,zlei Differential Revision: https://reviews.freebsd.org/D37566 MFC after: 2 weeks
# 9a7c520a	24-Sep-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	ifp: add if_setdescr() / if_freedesrt() methods Add methods for setting and removing the description from the interface, so the external users can manage it without using ioctl API. MFC after: 2 weeks
# b96549f0	11-Apr-2022	Konstantin Belousov <kib@FreeBSD.org>	struct ifnet: add if_capabilities2 and if_capenable2 bitmasks We are running out of bits in if_capabilities. Suggested by: jhb Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551
# f2ab9160	19-May-2022	Andrey V. Elsukov <ae@FreeBSD.org>	[vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event use it to update if_baudrate for vlan interfaces created on the LACP lagg. Differential revision: https://reviews.freebsd.org/D33405
# 4d7a1361	26-Jan-2022	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show e6abef09187a (cherry picked from commit e1882428dcbbafd2814d7e17b977a8f686784b39)
# 6c741ffb	03-May-2022	Marko Zec <zec@FreeBSD.org>	Revert "mbuf: do not restore dying interfaces" This reverts commit 703e533da5e2e4743d38bbf4605fec041bc69976. Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif" This reverts commit e1882428dcbbafd2814d7e17b977a8f686784b39. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
# 964b8f8b	28-Jan-2022	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet: garbage collect unused function ifaddr_byindex(). Last use was removed in 5adea417d49.
# e1882428	26-Jan-2022	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show e6abef09187a
# c8f2c290	25-Jan-2022	Hans Petter Selasky <hselasky@FreeBSD.org>	Add definitions for TLS receive tags using the existing send tag infrastructure. Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking
# 7e0bba4d	04-Dec-2021	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet: make V_if_index static to if.c This requires moving net.link.generic sysctl declaration from if_mib.c to if.c. Ideally if_mib.c needs just to be merged to if.c, but they have different license texts. Differential revision: https://reviews.freebsd.org/D33263
# d74b7bae	04-Dec-2021	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263
# 1e3ca25d	22-Nov-2021	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet: make if_alloc_domain() static
# 8a6f38c8	22-Nov-2021	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet: garbage collect drbr_*_drv(). They were left in 62d76917b8678 but after years proved not to be useful.
# c782ea8b	14-Sep-2021	John Baldwin <jhb@FreeBSD.org>	Add a switch structure for send tags. Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'. A pointer to this structure is added to the generic part of send tags and is initialized by m_snd_tag_init() (which now accepts a switch structure as a new argument in place of the type). Previously, device driver ifnet methods switched on the type to call type-specific functions. Now, those type-specific functions are saved in the switch structure and invoked directly. In addition, this more gracefully permits multiple implementations of the same tag within a driver. In particular, NIC TLS for future Chelsio adapters will use a different implementation than the existing NIC TLS support for T6 adapters. Reviewed by: gallatin, hselasky, kib (older version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31572
# 9e5243d7	30-Mar-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Enforce check for using the return result for ifa?_try_ref(). Suggested by: hps MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29504
# 25bfa448	21-Mar-2021	Adrian Chadd <adrian@FreeBSD.org>	Add device and ifnet logging methods, similar to device_printf / if_printf * device_printf() is effectively a printf * if_printf() is effectively a LOG_INFO This allows subsystems to log device/netif stuff using different log levels, rather than having to invent their own way to prefix unit/netif names. Differential Revision: https://reviews.freebsd.org/D29320 Reviewed by: imp
# 7563019b	22-Feb-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add if_try_ref() to simplify refcount handling inside epoch. When we have an ifp pointer and the code is running inside epoch, epoch guarantees the pointer will not be freed. However, the following case can still happen: * in thread 1 we drop to refcount=0 for ifp and schedule its deletion. * in thread 2 we use this ifp and reference it * destroy callout kicks in * unhappy user reports a bug This can happen with the current implementation of ifnet_byindex_ref(), as we're not holding any locks preventing ifnet deletion by a parallel thread. To address it, add if_try_ref(), allowing to return failure when referencing ifp with refcount=0. Additionally, enforce existing if_ref() is with KASSERT to provide a cleaner error in such scenarios. Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL if the latter fails. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28836
# 600eade2	16-Feb-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add ifa_try_ref() to simplify ifa handling inside epoch. More and more code migrates from lock-based protection to the NET_EPOCH umbrella. It requires some logic changes, including, notably, refcount handling. When we have an `ifa` pointer and we're running inside epoch we're guaranteed that this pointer will not be freed. However, the following case can still happen: * in thread 1 we drop to 0 refcount for ifa and schedule its deletion. * in thread 2 we use this ifa and reference it * destroy callout kicks in * unhappy user reports bug To address it, new `ifa_try_ref()` function is added, allowing to return failure when we try to reference `ifa` with 0 refcount. Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide cleaner error in such scenarious. Reviewed By: rstone, donner Differential Revision: https://reviews.freebsd.org/D28639 MFC after: 1 week
# 1a714ff2	26-Jan-2021	Randall Stewart <rrs@FreeBSD.org>	This pulls over all the changes that are in the netflix tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357
# a60100fd	25-Nov-2020	Kristof Provost <kp@FreeBSD.org>	if: Remove ifnet_rwlock It no longer serves any purpose, as evidenced by the fact that we never take it without ifnet_sxlock. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27278
# 521eac97	28-Oct-2020	John Baldwin <jhb@FreeBSD.org>	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691
# 4c7ba83f	12-Jul-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Switch inet6 default route subscription to the new rib subscription api. Old subscription model allowed only single customer. Switch inet6 to the new subscription api and eliminate the old model. Differential Revision: https://reviews.freebsd.org/D25615
# 4d2c2509	23-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Move <add\|del\|change>_route() functions to route_ctl.c in preparation of multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870
# 74787ef4	29-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add nhop to the ifa_rtrequest() callback. With the upcoming multipath changes described in D24141, rt->rt_nhop can potentially point to a nexthop group instead of an individual nhop. To simplify caller handling of such cases, change ifa_rtrequest() callback to pass changed nhop directly. Differential Revision: https://reviews.freebsd.org/D24604
# 67452942	17-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Finish r191148: replace rtentry with route in if_bridge if_output() callback. Generic if_output() callback signature was modified to use struct route instead of struct rtentry in r191148, back in 2009. Quoting commit message: Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Fix bridge_output() to match this signature and update the remaining comment in if_var.h. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24394
# 98085bae	09-Mar-2020	Andrew Gallatin <gallatin@FreeBSD.org>	make lacp's use_numa hashing aware of send tags When I did the use_numa support, I missed the fact that there is a separate hash function for send tag nic selection. So when use_numa is enabled, ktls offload does not work properly, as it does not reliably allocate a send tag on the proper egress nic since different egress nics are selected for send-tag allocation and packet transmit. To fix this, this change: - refectors lacp_select_tx_port_by_hash() and lacp_select_tx_port() to make lacp_select_tx_port_by_hash() always called by lacp_select_tx_port() - pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash() - adds a numa_domain field to if_snd_tag_alloc_params - plumbs the numa domain into places where we allocate send tags In testing with NIC TLS setup on a NUMA machine, I see thousands of output errors before the change when enabling kern.ipc.tls.ifnet.permitted=1. After the change, I see no errors, and I see the NIC sysctl counters showing active TLS offload sessions. Reviewed by: rrs, hselasky, jhb Sponsored by: Netflix
# 8ad798ae	03-Mar-2020	Brooks Davis <brooks@FreeBSD.org>	Expose ifr_buffer_get_(buffer\|length) outside if.c. This is a preparatory commit for D23933. Reviewed by: jhb
# d7313dc6	26-Feb-2020	Randall Stewart <rrs@FreeBSD.org>	This commit expands tcp_ratelimit to be able to handle cards like the mlx-c5 and c6 that require a "setup" routine before the tcp_ratelimit code can declare and use a rate. I add the setup routine to if_var as well as fix tcp_ratelimit to call it. I also revisit the rates so that in the case of a mlx card of type c5/6 we will use about 100 rates concentrated in the range where the most gain can be had (1-200Mbps). Note that I have tested these on a c5 and they work and perform well. In fact in an unloaded system they pace right to the correct rate (great job mlx!). There will be a further commit here from Hans that will add the respective changes to the mlx driver to support this work (which I was testing with). Sponsored by: Netflix Inc. Differential Revision: ttps://reviews.freebsd.org/D23647
# 3264dcad	14-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	- Move global network epoch definition to epoch.h, as more different subsystems tend to need to know about it, and including if_var.h is huge header pollution for them. Polluting possible non-network users with single symbol seems much lesser evil. - Remove non-preemptible network epoch. Not used yet, and unlikely to get used in close future.
# 0839aa5c	29-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	There is a long standing problem with multicast programming for NICs and IPv6. With IPv6 we may call if_addmulti() in context of processing of an incoming packet. Usually this is interrupt context. While most of the NIC drivers are able to reprogram multicast filters without sleeping, some of them can't. An example is e1000 family of drivers. With iflib conversion the problem was somewhat hidden. Iflib processes packets in private taskqueue, so going to sleep doesn't trigger an assertion. However, the sleep would block operation of the driver and following incoming packets would fill the ring and eventually would start being dropped. Enabling epoch for the full time of a packet processing again started to trigger assertions for e1000. Fix this problem once and for all using a general taskqueue to call if_ioctl() method in all cases when if_addmulti() is called in a non sleeping context. Note that nobody cares about returned value. Reviewed by: hselasky, kib Differential Revision: https://reviews.freebsd.org/D22154
# 19e09f44	21-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Remove obsoleted KPIs that were used to access interface address lists.
# 7790c8c1	17-Oct-2019	Conrad Meyer <cem@FreeBSD.org>	Split out a more generic debugnet(4) from netdump(4) Debugnet is a simplistic and specialized panic- or debug-time reliable datagram transport. It can drive a single connection at a time and is currently unidirectional (debug/panic machine transmit to remote server only). It is mostly a verbatim code lift from netdump(4). Netdump(4) remains the only consumer (until the rest of this patch series lands). The INET-specific logic has been extracted somewhat more thoroughly than previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as much as possible as is protocol-independent, remains in debugnet.c. The separation is not perfect and future improvement is welcome. Supporting INET6 is a long-term goal. Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to 'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the generic module would be more confusing than the refactoring. The only functional change here is the mbuf allocation / tracking. Instead of initiating solely on netdump-configured interface(s) at dumpon(8) configuration time, we watch for any debugnet-enabled NIC for link activation and query it for mbuf parameters at that time. If they exceed the existing high-water mark allocation, we re-allocate and track the new high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone. In a future patch in this series, this will allow initiating netdump from panic ddb(4) without pre-panic configuration. No other functional change intended. Reviewed by: markj (earlier version) Some discussion with: emaste, jhb Objection from: marius Differential Revision: https://reviews.freebsd.org/D21421
# 270b83b9	14-Oct-2019	Hans Petter Selasky <hselasky@FreeBSD.org>	The two functions ifnet_byindex() and ifnet_byindex_locked() are exactly the same after the network stack was epochified. Merge the two into one function and cleanup all uses of ifnet_byindex_locked(). While at it: - Add branch prediction macros. - Make sure the ifnet pointer is only deferred once, also when code optimisation is disabled. Sponsored by: Mellanox Technologies
# fb3fc771	10-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Add two extra functions that basically give count of addresses on interface. Such function could been implemented on top of the if_foreach_llm?addr(), but several drivers need counting, so avoid copy-n-paste inside the drivers.
# 826857c8	10-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Provide new KPI for network drivers to access lists of interface addresses. The KPI doesn't reveal neither how addresses are stored, how the access to them is synchronized, neither reveal struct ifaddr and struct ifmaddr. Reviewed by: gallatin, erj, hselasky, philip, stevek Differential Revision: https://reviews.freebsd.org/D21943
# b2e60773	26-Aug-2019	John Baldwin <jhb@FreeBSD.org>	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277
# 20abea66	01-Aug-2019	Randall Stewart <rrs@FreeBSD.org>	This adds the third step in getting BBR into the tree. BBR and an updated rack depend on having access to the new ratelimit api in this commit. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20953
# e2e050c8	19-May-2019	Conrad Meyer <cem@FreeBSD.org>	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
# 7687707d	22-Apr-2019	Andrew Gallatin <gallatin@FreeBSD.org>	Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory This commit adds new if_alloc_domain() and if_alloc_dev() methods to allocate ifnets. When called with a domain on a NUMA machine, ifalloc_domain() will record the NUMA domain in the ifnet, and it will allocate the ifnet struct from memory which is local to that NUMA node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain which uses a driver supplied device_t to call ifalloc_domain() with the appropriate domain. Note that the new if_numa_domain field fits in an alignment pad in struct ifnet, and so does not alter the size of the structure. Reviewed by: glebius, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19930
# b252313f	31-Jan-2019	Gleb Smirnoff <glebius@FreeBSD.org>	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951
# a68cc388	08-Jan-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin
# b79aa45e	13-Nov-2018	Gleb Smirnoff <glebius@FreeBSD.org>	For compatibility KPI functions like if_addr_rlock() that used to have mutexes but now are converted to epoch(9) use thread-private epoch_tracker. Embedding tracker into ifnet(9) or ifnet derived structures creates a non reentrable function, that will fail miserably if called simultaneously from two different contexts. A thread private tracker will provide a single tracker that would allow to call these functions safely. It doesn't allow nested call, but this is not expected from compatibility KPIs. Reviewed by: markj
# 64d63b1e	21-Oct-2018	Andrey V. Elsukov <ae@FreeBSD.org>	Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemented using ifaddr_event_ext handler. MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17100
# 1687b1ab	29-Sep-2018	Michael Tuexen <tuexen@FreeBSD.org>	For changing the MTU on tun/tap devices, it should not matter whether it is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command. Without this patch, for IPv6 the new MTU is not used when creating routes. Especially, when initiating TCP connections after increasing the MTU, the old MTU is still used to compute the MSS. Thanks to ae@ and bz@ for helping to improve the patch. Reviewed by: ae@, bz@ Approved by: re (kib@) MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17180
# b08d611d	20-Sep-2018	Matt Macy <mmacy@FreeBSD.org>	fix vlan locking to permit sx acquisition in ioctl calls - update vlan(9) to handle changes earlier this year in multicast locking Tested by: np@, darkfiberu at gmail.com PR: 230510 Reviewed by: mjoras@, shurd@, sbruno@ Approved by: re (gjb@) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16808
# f9be0386	15-Aug-2018	Matt Macy <mmacy@FreeBSD.org>	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu
# 98a8fdf6	09-Jul-2018	Andrey V. Elsukov <ae@FreeBSD.org>	Deduplicate the code. Add generic function if_tunnel_check_nesting() that does check for allowed nesting level for tunneling interfaces and also does loop detection. Use it in gif(4), gre(4) and me(4) interfaces. Differential Revision: https://reviews.freebsd.org/D16162
# 6573d758	03-Jul-2018	Matt Macy <mmacy@FreeBSD.org>	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066
# 0f8d79d9	24-May-2018	Matt Macy <mmacy@FreeBSD.org>	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors
# 4f6c66cc	23-May-2018	Matt Macy <mmacy@FreeBSD.org>	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409
# fd04260d	20-May-2018	Matt Macy <mmacy@FreeBSD.org>	ck: simplify interface with libkvm consumers by defining ck_queue types as their queue.h equivalents if !_KERNEL
# d7c5a620	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366
# 70398c2f	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj
# 5c30b378	10-May-2018	Matt Macy <mmacy@FreeBSD.org>	Allow different bridge types to coexist if_bridge has a lot of limitations that make it scale poorly to higher data rates. In my projects/VPC branch I leverage the bridge interface between layers for my high speed soft switch as well as for purposes of stacking in general. Reviewed by: sbruno@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15344
# 7bf272a6	10-May-2018	Matt Macy <mmacy@FreeBSD.org>	Allocate epoch for networking at startup Additionally add CK to include paths for modules Approved by: sbruno@
# b6f6f880	06-May-2018	Matt Macy <mmacy@FreeBSD.org>	r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno
# e5054602	05-May-2018	Mark Johnston <markj@FreeBSD.org>	Import the netdump client code. This is a component of a system which lets the kernel dump core to a remote host after a panic, rather than to a local storage device. The server component is available in the ports tree. netdump is particularly useful on diskless systems. The netdump(4) man page contains some details describing the protocol. Support for configuring netdump will be added to dumpon(8) in a future commit. To use netdump, the kernel must have been compiled with the NETDUMP option. The initial revision of netdump was written by Darrell Anderson and was integrated into Sandvine's OS, from which this version was derived. Reviewed by: bdrewery, cem (earlier versions), julian, sbruno MFC after: 1 month X-MFC note: use a spare field in struct ifnet Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15253
# 4a381a9e	26-Apr-2018	Hans Petter Selasky <hselasky@FreeBSD.org>	Add network device event for priority code point, PCP, changes. When the PCP is changed for either a VLAN network interface or when prio tagging is enabled for a regular ethernet network interface, broadcast the IFNET_EVENT_PCP event so applications like ibcore can update its GID tables accordingly. MFC after: 3 days Reviewed by: ae, kib Differential Revision: https://reviews.freebsd.org/D15040 Sponsored by: Mellanox Technologies
# 541d96aa	30-Mar-2018	Brooks Davis <brooks@FreeBSD.org>	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900
# f1379734	27-Mar-2018	Konstantin Belousov <kib@FreeBSD.org>	Allow to specify PCP on packets not belonging to any VLAN. According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged, and only PCP and DEI values from the VLAN tag are meaningful. See for instance https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html. Make it possible to specify PCP value for outgoing packets on an ethernet interface. When PCP is supplied, the tag is appended, VLAN id set to 0, and PCP is filled by the supplied value. The code to do VLAN tag encapsulation is refactored from the if_vlan.c and moved into if_ethersubr.c. Drivers might have issues with filtering VID 0 packets on receive. This bug should be fixed for each driver. Reviewed by: ae (previous version), hselasky, melifaro Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14702
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 95ed5015	06-Sep-2017	Hans Petter Selasky <hselasky@FreeBSD.org>	Add support for generic backpressure indicator for ratelimited transmit queues aswell as non-ratelimited ones. Add the required structure bits in order to support a backpressure indication with ratelimited connections aswell as non-ratelimited ones. The backpressure indicator is a value between zero and 65535 inclusivly, indicating if the destination transmit queue is empty or full respectivly. Applications can use this value as a decision point for when to stop transmitting data to avoid endless ENOBUFS error codes upon transmitting an mbuf. This indicator is also useful to reduce the latency for ratelimited queues. Reviewed by: gallatin, kib, gnn Differential Revision: https://reviews.freebsd.org/D11518 Sponsored by: Mellanox Technologies
# ddae5750	10-May-2017	Ravi Pokala <rpokala@FreeBSD.org>	Persistently store NIC's hardware MAC address, and add a way to retrive it The MAC address reported by `ifconfig ${nic} ether' does not always match the address in the hardware, as reported by the driver during attach. In particular, NICs which are components of a lagg(4) interface all report the same MAC. When attaching, the NIC driver passes the MAC address it read from the hardware as an argument to ether_ifattach(). Keep a second copy of it, and create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along with the active MAC address. PR: 194386 Reviewed by: glebius MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D10609
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# d0b2cad1	31-Jan-2017	Stephen J. Kiernan <stevek@FreeBSD.org>	Add the folowing set accessor functions for recently-added members of ifnet structure: if_gethwtsomax(), if_sethwtsomax() - if_hw_tsomax if_gethwtsomaxsegcount(), if_sethwtsomaxsegcount() - if_hw_tsomaxsegcount if_gethwtsomaxsegsize(), if_sethwtsomaxsegsize() - if_hw_tsomaxsegsize Update em and vnic drivers which had already been coverted to use accessor functions for the other ifnet structure members. Reviewed by: erj Approved by: sjg (mentor) Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D8544
# 6597559e	28-Jan-2017	Dexuan Cui <dexuan@FreeBSD.org>	ifnet: move the new ifnet_event EVENTHANDLER_DECLARE to net/if_var.h Thank glebius for pointing this out: "The network stuff shall not be added to sys/eventhandler.h" Reviewed by: David_A_Bright_DELL.com, sephe, glebius Approved by: sephe (mentor) MFC after: 2 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9345
# f3e7afe2	18-Jan-2017	Hans Petter Selasky <hselasky@FreeBSD.org>	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months
# b95d46da	18-Oct-2016	Kevin Lo <kevlo@FreeBSD.org>	Fix typo in comment.
# c2b5ba76	05-Oct-2016	Kevin Lo <kevlo@FreeBSD.org>	Remove an alias if_list, use if_link consistently. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8075
# decb239d	28-Sep-2016	Kevin Lo <kevlo@FreeBSD.org>	Remove the compatibility macro if_addrlist. Since if_addrlist is used only for ipfilter(4), add a macro if_addrlist in ip_compat.h. Reviewed by: cy Differential Revision: https://reviews.freebsd.org/D8059
# c7641cd1	28-Sep-2016	Kevin Lo <kevlo@FreeBSD.org>	Remove ifa_list, use ifa_link (structure field) instead. While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the interface address list in sctp_bsd_addr.c Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8051
# 3ff511d3	27-Sep-2016	Kevin Lo <kevlo@FreeBSD.org>	Remove a comment about the size of the ifnet structure. Reviewed by: adrian Differential Revision: https://reviews.freebsd.org/D8036
# f22bfc72	23-Jun-2016	Navdeep Parhar <np@FreeBSD.org>	Add spares to struct ifnet and socket for packet pacing and/or general use. Update comments regarding the spare fields in struct inpcb. Bump __FreeBSD_version for the changes to the size of the structures. Reviewed by: gnn@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications
# ad4e9116	18-May-2016	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than having the if_vmove() code intermixed in the vnet_destroy() function in vnet.c move it to if.c where it logically belongs and put it under a VNET_SYSUNINIT() call. To not change the current behaviour make sure it runs first thing during teardown. In the future this will allow us more flexibility on changing the order on when we want to get rid of interfaces. Stop exporting if_vmove() and make it file static. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6438
# 4c7070db	17-May-2016	Scott Long <scottl@FreeBSD.org>	Import the 'iflib' API library for network drivers. From the author: "iflib is a library to eliminate the need for frequently duplicated device independent logic propagated (poorly) across many network drivers." Participation is purely optional. The IFLIB kernel config option is provided for drivers that want to transition between legacy and iflib modes of operation. ixl and ixgbe driver conversions will be committed shortly. We hope to see participation from the Broadcom and maybe Chelsio drivers in the near future. Submitted by: mmacy@nextbsd.org Reviewed by: gallatin Differential Revision: D5211
# 4fb3a820	30-Dec-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102
# d6e82913	17-Dec-2015	Steven Hartland <smh@FreeBSD.org>	Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay
# 52e53e2d	15-Dec-2015	Steven Hartland <smh@FreeBSD.org>	Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111
# ef91a976	25-Nov-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Overhaul if_enc(4) and make it loadable in run-time. Use hhook(9) framework to achieve ability of loading and unloading if_enc(4) kernel module. INET and INET6 code on initialization registers two helper hooks points in the kernel. if_enc(4) module uses these helper hook points and registers its hooks. IPSEC code uses these hhook points to call helper hooks implemented in if_enc(4).
# 59c180c3	16-Sep-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
# d76d4012	14-Sep-2015	Hans Petter Selasky <hselasky@FreeBSD.org>	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks
# 441f9243	04-Sep-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Constantify lookup key in ifa_ifwith* functions. Some places in our network stack already have const arguments (like if_output() routines and LLE functions). Code using ifa_ifwith (and similar functins) along with LLE/_output functions is currently bound to use tricks like __DECONST(). Provide a cleaner way by making sockaddr lookup key really constant. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3464
# 772e66a6	16-Apr-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Move ALTQ from contrib to net/altq. The ALTQ code is for many years discontinued by its initial authors. In FreeBSD the code was already slightly edited during the pf(4) SMP project. It is about to be edited more in the projects/ifnet. Moving out of contrib also allows to remove several hacks to the make glue. Reviewed by: net@
# fec642ad	26-Feb-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Hide struct ifmultiaddr under _KERNEL, too.
# e072c794	19-Feb-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Now that all users of _WANT_IFADDR are fixed, remove this crutch and hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 73d77028	23-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Do more fine-grained lltable locking: use table runtime lock as rare as we can.
# 7c066c18	22-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Use less-invasive approach for IF_AFDATA lock: convert into 2 locks: use rwlock accessible via external functions (IF_AFDATA_CFG_* -> if_afdata_cfg_()) for all control plane tasks use rmlock (IF_AFDATA_RUN_) for fast-path lookups.
# 27688dfe	22-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Temporarily revert r274774.
# 9883e41b	20-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Switch IF_AFDATA lock to rmlock
# 3c7c188c	10-Nov-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Fix some minor TSO issues: - Improve description of TSO limits. - Remove a not needed KASSERT() - Remove some not needed variable casts. Sponsored by: Mellanox Technologies Discussed with: lstewart @ MFC after: 1 week
# 033074c4	09-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.
# 4ea05db8	09-Nov-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Use standard mtx(9), rwlock(9), sx(9) system initialization macros instead of doing initialization manually. Sponsored by: Nginx, Inc. Sponsored by: Netflix
# 1a75e3b2	06-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC
# f8ca6199	03-Nov-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Clarify TSO segment limit comment and remove two TABs to make lines a bit shorter. Sponsored by: Mellanox Technologies
# cbaac009	28-Sep-2014	Bjoern A. Zeeb <bz@FreeBSD.org>	Move the unconditional #include of net/ifq.h to the very end of file. This seems to allow us to pass a universe with either clang or gcc after r272244 (and r272260) and probably makes it easier to untabgle these chained #includes in the future.
# bd071d4d	28-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	- Remove empty wrappers ether_poll_[de]register_drv(). [1] - Move polling(9) declarations out of ifq.h back to if_var.h they are absolutely unrelated to queues. Submitted by: Mikhail <mp lenta.ru> [1]
# 112f50ff	28-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Finally, convert counters in struct ifnet to counter(9). Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 7d6cc45c	27-Sep-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Use underlying ports counters to get lagg statistics instead of per-packet accounting. This introduce user-visible changes like aggregating error counters. Reviewed by: asomers (prev.version), glebius CR: D781 MFC after: 2 weeks Sponsored by: Yandex LLC
# 9fd573c3	22-Sep-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week
# d2a707cd	18-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Remove a bunch of methods that are superseded by if_inc_counter(). Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 1b7fb1d9	18-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	While not too late rename 'ifnet_counter' to 'ift_counter'. One of the imporant moments that we discussed with Marcel and Anuranjan was that a converted driver should return false for 'grep ifnet if_driver.c' :) Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 35853c2c	18-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Add a function to set if_get_counter method for an ifnet. To be used in the drivers that are already converted to "Juniper drvapi". This can be revisited in future.
# 277e067a	18-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	While not too late rename if_get_counter_compat() to if_get_counter_default(). The compat counters will go away, but the function will remain in its place, and in all places where it is going to be called. Discussed with: melifaro
# 0b7b006c	18-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Add if_inc_counter(), a generic method to update ifnet(9) counter w/o dereferencing the struct. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 72f31000	13-Sep-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Revert r271504. A new patch to solve this issue will be made. Suggested by: adrian @
# eb93b77a	13-Sep-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies
# 4f8585e0	11-Sep-2014	Alan Somers <asomers@FreeBSD.org>	Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and ifa_ifwithdstaddr. For the sake of backwards compatibility, the new arguments were added to new functions named ifa_ifwithnet_fib and ifa_ifwithdstaddr_fib, while the old functions became wrappers around the new ones that passed RT_ALL_FIBS for the fib argument. However, the backwards compatibility is not desired for FreeBSD 11, because there are numerous other incompatible changes to the ifnet(9) API. We therefore decided to remove it from head but leave it in place for stable/9 and stable/10. In addition, this commit adds the fib argument to ifa_ifwithbroadaddr for consistency's sake. sys/sys/param.h Increment __FreeBSD_version sys/net/if.c sys/net/if_var.h sys/net/route.c Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute. sys/net/route.c sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_options.c sys/netinet/ip_output.c sys/netinet6/nd6.c Fixup calls of modified functions. share/man/man9/ifnet.9 Document changed API. CR: https://reviews.freebsd.org/D458 MFC after: Never Sponsored by: Spectra Logic
# ccbefc2d	31-Aug-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Toss fields so that no padding field is required to achieve alignment.
# 09a8241f	30-Aug-2014	Gleb Smirnoff <glebius@FreeBSD.org>	It is actually possible to have if_t a typedef to non-void type, and keep both converted to drvapi and non-converted drivers compilable. o Make if_t typedef to struct ifnet *. o Remove shim functions. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 997d2d83	31-Aug-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Provide pointer from struct ifnet to struct netmap_adapter, instead of abusing spare field.
# e6485f73	31-Aug-2014	Gleb Smirnoff <glebius@FreeBSD.org>	o Remove struct if_data from struct ifnet. Now it is merely API structure for route(4) socket and ifmib(4) sysctl. o Move fields from if_data to ifnet, but keep all statistic counters separate, since they should disappear later. o Provide function if_data_copy() to fill if_data, utilize it in routing socket and ifmib handler. o Provide overridable ifnet(9) method to fetch counters. If no provided, if_get_counters_compat() would be used, that returns old counters. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 9753faf5	29-Jul-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric
# 62d76917	02-Jun-2014	Marcel Moolenaar <marcel@FreeBSD.org>	Introduce a procedural interface to the ifnet structure. The new interface allows the ifnet structure to be defined as an opaque type in NIC drivers. This then allows the ifnet structure to be changed without a need to change or recompile NIC drivers. Put differently, NIC drivers can be written and compiled once and be used with different network stack implementations, provided of course that those network stack implementations have an API and ABI compatible interface. This commit introduces the 'if_t' type to replace 'struct ifnet ' as the type of a network interface. The 'if_t' type is defined as 'void ' to enable the compiler to perform type conversion to 'struct ifnet *' and vice versa where needed and without warnings. The functions that implement the API are the only functions that need to have an explicit cast. The MII code has been converted to use the driver API to avoid unnecessary code churn. Code churn comes from having to work with both converted and unconverted drivers in correlation with having callback functions that take an interface. By converting the MII code first, the callback functions can be defined so that the compiler will perform the typecasts automatically. As soon as all drivers have been converted, the if_t type can be redefined as needed and the API functions can be fix to not need an explicit cast. The immediate benefactors of this change are: 1. Juniper Networks - The network stack implementation in Junos is entirely different from FreeBSD's one and this change allows Juniper to build "stock" NIC drivers that can be used in combination with both the FreeBSD and Junos stacks. 2. FreeBSD - This change opens the door towards changing ifnet and implementing new features and optimizations in the network stack without it requiring a change in the many NIC drivers FreeBSD has. Submitted by: Anuranjan Shukla <anshukla@juniper.net> Reviewed by: glebius@ Obtained from: Juniper Networks, Inc.
# 2f308a34	29-May-2014	Alan Somers <asomers@FreeBSD.org>	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic
# 0cfee0c2	24-Apr-2014	Alan Somers <asomers@FreeBSD.org>	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic
# 0489b891	24-Apr-2014	Alan Somers <asomers@FreeBSD.org>	Fix host and network routes for new interfaces when net.add_addr_allfibs=0 sys/net/route.c In rtinit1, use the interface fib instead of the process fib. The latter wasn't very useful because ifconfig(8) is usually invoked with the default process fib. Changing ifconfig(8) to use setfib(2) would be redundant, because it already sets the interface fib. tests/sys/netinet/fibs_test.sh Clear the expected ATF failure sys/net/if.c Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib sys/netinet/in.c sys/net/if_var.h Add a fibnum argument to ifa_switch_loopback_route, a subroutine of in_scrubprefix. Pass it the interface fib. PR: kern/187549 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation
# 61b9f217	19-Mar-2014	Navdeep Parhar <np@FreeBSD.org>	Add a shorter alias for if_data.ifi_oqdrops.
# b245f96c	12-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 9a6356bc	05-Nov-2013	Gleb Smirnoff <glebius@FreeBSD.org>	In complemence to ifa_add_loopback_route() and ifa_del_loopback_route() provide function ifa_switch_loopback_route() that will be used in case when an interface address used for a loopback route goes away, but we have another interface address with same address value and want to preserve loopback route. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# b1b9dcae	05-Nov-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by default from the very beginning. It was placed in wrong namespace net.link.ether, originally it had been at another wrong namespace. It was incorrectly documented at incorrect manual page arp(8). Since new-ARP commit, the tunable have been consulted only on route addition, and ignored on route deletion. Behaviour of a system with tunable turned off is not fully correct, and has no advantages comparing to normal behavior.
# 5b74cfe4	31-Oct-2013	Andre Oppermann <andre@FreeBSD.org>	Make struct ifnet readable and comprehensible again by grouping and ordering related variables, fields and locks next to each other. Add more comments to variables. Over time 'ifnet' has accumlated a lot of additional pointers and functionality in an unstructured way making it quite hard to read and understand while obfuscating relationships between fields and variables. Quantify the structure size and how bloated it has become. This is only a mechanical change in preparation for upcoming work to make ifnet opaque to drivers and to separate out the interface queuing. Sponsored by: The FreeBSD Foundation
# ded7d20f	29-Oct-2013	Andre Oppermann <andre@FreeBSD.org>	Move all interface queue related structures, macros and definitions from net/if_var to it own new net/ifq.h. For now net/ifq.h is unconditionally included through net/if_var.h. This is a mechanical change in preparation to make struct ifnet and the individual interface queue mechanisms opaque. Discussed with: glebius Sponsored by: The FreeBSD Foundation
# eaeb0c13	28-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Style: s/SYS_EVENTHANDLER_H/_SYS_EVENTHANDLER_H_/g Submitted by: bde
# c29e1ad9	28-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	- Make the prophecy from 1997 happen and remove if_var.h inclusion from if.h. - Remove unnecessary includes and declarations from if.h - Remove unnecessary includes and declarations from if_var.h [1] - Mark some declarations that are about to be removed in near future with comments, explaning why this declaration is still necessary. - Protect eventhandler declarations with #ifdef SYS_EVENTHANDLER_H. Obtained from: bdeBSD [1] Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 7caf4ab7	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 3fffa8c8	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Push some defines under _KERNEL, improve styling and comments.
# 67420bda	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Remove ifa_mtx. It was used only in one place in kernel, and ifnet's ifaddr lock can substitute it there. Discussed with: melifaro, ae Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 46758960	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 6ed910fa	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# d36ed80a	05-Jul-2013	Colin Percival <cperciva@FreeBSD.org>	Fix typo: minmum -> minimum. Submitted by: @z3ndrag0n
# 3c914c54	02-Jun-2013	Andre Oppermann <andre@FreeBSD.org>	Allow drivers to specify a maximum TSO length in bytes if they are limited in the amount of data they can handle at once. Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet. The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by: cperciva MFC after: 1 week (using spare struct members to preserve ABI)
# f89d4c3a	06-May-2013	Andre Oppermann <andre@FreeBSD.org>	Back out r249318, r249320 and r249327 due to a heisenbug most likely related to a race condition in the ipi_hash_lock with the exact cause currently unknown but under investigation.
# 47e8d432	25-Apr-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Add const qualifier to the dst parameter of the ifnet if_output method.
# 18ba072a	10-Apr-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Fix build.
# e8b3186b	09-Apr-2013	Andre Oppermann <andre@FreeBSD.org>	Change certain heavily used network related mutexes and rwlocks to reside on their own cache line to prevent false sharing with other nearby structures, especially for those in the .bss segment. NB: Those mutexes and rwlocks with variables next to them that get changed on every invocation do not benefit from their own cache line. Actually it may be net negative because two cache misses would be incurred in those cases.
# 24421c1c	11-Feb-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Resolve source address selection in presense of CARP. Add a couple of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc
# ded5ea6a	07-Feb-2013	Randall Stewart <rrs@FreeBSD.org>	This fixes a out-of-order problem with several of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit. The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback cannot be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf. Reviewed by: jhb@freebsd.org, jlv@freebsd.org
# c9b652e3	18-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.
# 608ae712	17-Oct-2012	Maksim Yevmenkin <emax@FreeBSD.org>	provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf. again, use ixgbe(4) as an example of how to use new helper function. Reviewed by: jhb MFC after: 1 week
# 0fef97fe	16-Oct-2012	Maksim Yevmenkin <emax@FreeBSD.org>	introduce concept of ifi_baudrate power factor. the idea is to work around the problem where high speed interfaces (such as ixgbe(4)) are not able to report real ifi_baudrate. bascially, take a spare byte from struct if_data and use it to store ifi_baudrate power factor. in other words, real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor this should be backwards compatible with old binaries. use ixgbe(4) as an example on how drivers would set ifi_baudrate power factor Discussed with: kib, scottl, glebius MFC after: 1 week
# 063efed2	28-Sep-2012	Gleb Smirnoff <glebius@FreeBSD.org>	The drbr(9) API appeared to be so unclear, that most drivers in tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times. o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists. o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes. o Drivers handle their stats theirselves: if_obytes, if_omcasts. o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves. o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself. o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well. o mxge(4) just no longer needs NO_SLOW_STATS define. o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware. Reviewed by: jfv, gnn
# 73c23f3b	04-Sep-2012	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix the build broken by r240099. Hide link_pfil_hook under _KERNEL macro. MFC after: 3 weeks
# 7d4317bd	04-Sep-2012	Alexander V. Chernikov <melifaro@FreeBSD.org>	Introduce new link-layer PFIL hook V_link_pfil_hook. Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit. Remove ipfw headers from ether/bridge code since they are unneeded now. Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet. MFC after: 3 weeks
# ea537929	02-Aug-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via \|= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
# 09fe6320	19-Jun-2012	Navdeep Parhar <np@FreeBSD.org>	- Updated TOE support in the kernel. - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m \| grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp \| grep toe # sockstat -46c \| grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
# 02ed02af	19-Mar-2012	John Baldwin <jhb@FreeBSD.org>	Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD. The new [RW]LOCK macros are merged back to 8.x so should be suitable for new code in HEAD even if it is to be MFC'd.
# 4ecf274b	08-Feb-2012	Sergey Kandaurov <pluknet@FreeBSD.org>	g/c last bit of old ipv6 prefix management. Reviewed by: bz Obtained from: NetBSD, net/if.h, rev 1.80
# fbcebf7f	09-Jan-2012	John Baldwin <jhb@FreeBSD.org>	Convert the per-interface address list lock from a mutex to a reader/writer lock. Reviewed by: bz
# a2cb1d52	05-Jan-2012	John Baldwin <jhb@FreeBSD.org>	Add new variants of the IF_ADDR_LOCK() macros used for protecting interface address lists that distinguish read locks from write locks. To preserve the KPI, the previous operations are mapped to the write lock macros. The lock is still kept as a mutex for now. Reviewed by: bz MFC after: 2 weeks
# 08b68b0e	15-Dec-2011	Gleb Smirnoff <glebius@FreeBSD.org>	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
# f26fa169	09-Dec-2011	Brooks Davis <brooks@FreeBSD.org>	Remove the unused if_free_type() function. X-MFC after: never
# a0af7c3e	27-Oct-2011	Gleb Smirnoff <glebius@FreeBSD.org>	Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off the queue. It can be utilized in queue processing to avoid multiple locking/unlocking.
# d9a36286	17-Jul-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix) Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align. Discussed with: rwatson (slightly earlier version)
# 43deddcd	03-Jul-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Remove extra white space to comply with style for the rest of the struct. MFC after: 2 weeks
# 35fd7bc0	02-Jul-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Add infrastructure to allow all frames/packets received on an interface to be assigned to a non-default FIB instance. You may need to recompile world or ports due to the change of struct ifnet. Submitted by: cjsp Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) (original versions) Reviewed by: julian Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru) MFC after: 2 weeks X-MFC: use spare in struct ifnet
# e4cd31dd	21-Mar-2011	Jeff Roberson <jeff@FreeBSD.org>	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# dd62f5c0	25-Jun-2010	Qing Li <qingli@FreeBSD.org>	MFC r208553 This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. Approved by: re (bz)
# 0ed6142b	25-May-2010	Qing Li <qingli@FreeBSD.org>	This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. MFC after: 3 days
# 7c61d493	24-May-2010	Andrew Thompson <thompsa@FreeBSD.org>	MFC r202588 Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev MFC r202611 Do not hold the lock over if_setlladdr() as it calls into the interface driver init routine.
# 29f2c008	18-Mar-2010	Max Laier <mlaier@FreeBSD.org>	MFC r203834 and r205197: Make ALTQ work for drbr consumers.
# 4c71aa58	15-Mar-2010	Max Laier <mlaier@FreeBSD.org>	Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834. MFC after: 3 days
# a5a931b3	25-Feb-2010	Xin LI <delphij@FreeBSD.org>	MFC 203052: Add interface description capability as inspired by OpenBSD. Thanks for rwatson@, jhb@, brooks@ and others for feedback to the old implementation! Sponsored by: iXsystems, Inc.
# 193cbc4d	13-Feb-2010	Max Laier <mlaier@FreeBSD.org>	Fix drbr and altq interaction: - introduce drbr_needs_enqueue that returns whether the interface/br needs an enqueue operation: returns true if altq is enabled or there are already packets in the ring (as we need to maintain packet order) - update all drbr consumers - fix drbr_flush - avoid using the driver queue (IFQ_DRV_*) in the altq case as the multiqueue consumer does not provide enough protection, serialize altq interaction with the main queue lock - make drbr_dequeue_cond work with altq Discussed with: kmacy, yongari, jfv MFC after: 4 weeks
# 3ddba633	31-Jan-2010	Shteryana Shopova <syrinx@FreeBSD.org>	MFC r202935: While flushing the multicast filter of an interface, do not zero the relevant ifmultiaddr structures' reference to the parent interface, unless the parent interface is really detaching. While here, program only link layer multicast filters to a wlan's hardware parent interface. PR: kern/142391, kern/142392 Reviewed by: sam, rpaulo, bms
# 215940b3	26-Jan-2010	Xin LI <delphij@FreeBSD.org>	Revised revision 199201 (add interface description capability as inspired by OpenBSD), based on comments from many, including rwatson, jhb, brooks and others. Sponsored by: iXsystems, Inc. MFC after: 1 month
# 93ec7edc	24-Jan-2010	Shteryana Shopova <syrinx@FreeBSD.org>	While flushing the multicast filter of an interface, do not zero the relevant ifmultiaddr structures' reference to the parent interface, unless the parent interface is really detaching. While here, program only link layer multicast filters to a wlan's hardware parent interface. PR: kern/142391, kern/142392 Reviewed by: sam, rpaolo, bms MFC after: 1 week
# ea4ca115	18-Jan-2010	Andrew Thompson <thompsa@FreeBSD.org>	Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev
# 130fd3bc	05-Jan-2010	Qing Li <qingli@FreeBSD.org>	MFC r201319 Remove a deleted comment line that was brought back by my previous commit.
# 32c53401	05-Jan-2010	Qing Li <qingli@FreeBSD.org>	MFC r201282, r201543 r201282 ------- The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. r201543 ------- The IFA_RTSELF address flag marks a loopback route has been installed for the interface address. This marker is necessary to properly support PPP types of links where multiple links can have the same local end IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which was combined into the route flag bits during prefix installation in IPv6. This inclusion causing the prefix route to be unusable. This patch fixes this bug by excluding the IFA_RTSELF flag during route installation. PR: ports/141342, kern/141134
# 9f140905	30-Dec-2009	Qing Li <qingli@FreeBSD.org>	Remove a deleted comment line that was brought back by my previous commit. MFC after: 5 days
# c7ab6602	30-Dec-2009	Qing Li <qingli@FreeBSD.org>	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days
# 8e968376	21-Dec-2009	John Baldwin <jhb@FreeBSD.org>	Remove commented out prototype for ifinit(). This prototype has been commented out since 1.1 and has not been present in <sys/systm.h> since at least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().
# 34605f85	30-Nov-2009	John Baldwin <jhb@FreeBSD.org>	Remove if_timer/if_watchdog now that they are no longer used. The space used by if_timer is reserved for expanding if_index to an int in the future. Reviewed by: rwatson, brooks
# 1a9d4dda	12-Nov-2009	Xin LI <delphij@FreeBSD.org>	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.
# 41c8c6e8	11-Nov-2009	Xin LI <delphij@FreeBSD.org>	Add interface description capability as inspired by OpenBSD. MFC after: 3 months
# 553a7dec	15-Sep-2009	Qing Li <qingli@FreeBSD.org>	MFC r197227 Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz Approved by: re
# 9bb7d0f4	15-Sep-2009	Qing Li <qingli@FreeBSD.org>	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately
# b569420a	28-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge r196510 from head to stable/8: Make if_grow static -- it's not used outside of if.c, and with the internals destined to change, it's better if it remains that way. Approved by: re (kib)
# 3ef94f2b	28-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge r196481 from head to stable/8: Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian Approved by: re (kib)
# 8e937462	23-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Make if_grow static -- it's not used outside of if.c, and with the internals destined to change, it's better if it remains that way. MFC after: 3 days
# 77dfcdc4	23-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days
# aeba9e80	20-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge r196263 from head to stable/8: Remove unused if_rawoutput() macro; it has been unused since at least FreeBSD 2. Approved by: re (kib)
# d931ea09	15-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Remove unused if_rawoutput() macro; it has been unused since at least FreeBSD 2. Approved by: re (kib)
# df813b7e	27-Jul-2009	Qing Li <qingli@FreeBSD.org>	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re
# 1e77c105	16-Jul-2009	Robert Watson <rwatson@FreeBSD.org>	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
# eddfbb76	14-Jul-2009	Robert Watson <rwatson@FreeBSD.org>	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
# 6cb7f168	29-Jun-2009	Brooks Davis <brooks@FreeBSD.org>	Remove support for the /dev/net/* per-interface devices. They serve little purpose and are unused in the base system. The IOCTL functionality is entirely duplicated and routing sockets provide a richer interface than the kqueue functionality. Further, it is not practical for these devices to be made sensible in the face of VIMAGE. Bump __FreeBSD_version on the off chance that there is any code out there that actually uses this stuff. Reviewed by: rwatson Discussed with: bz, zec Approved by: re@ (kensmith)
# f9ef96ca	25-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Define four wrapper functions for interface address locking, if_addr_rlock() and if_addr_runlock() for regular address lists, and if_maddr_rlock() and if_maddr_runlock() for multicast address lists. We will use these in various kernel modules to avoid encoding specific type and locking strategy information into modules that currently use IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly. MFC after: 6 weeks
# 8896f83a	22-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks
# 1099f828	21-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks
# d49cd9a1	19-Jun-2009	Kip Macy <kmacy@FreeBSD.org>	add helper function for flushing software queues
# d659538f	15-Jun-2009	Sam Leffler <sam@FreeBSD.org>	r193336 moved ifq_detach to if_free which broke if_alloc followed by if_free (w/o doing if_attach); move ifq_attach to if_alloc and rename ifq_attach/detach to ifq_init/ifq_delete to better identify their purpose Reviewed by: jhb, kmacy
# a913be09	09-Jun-2009	Kip Macy <kmacy@FreeBSD.org>	- add drbr routines for accessing #qentries and conditionally dequeueing - track bytes enqueued in buf_ring
# bc29160d	08-Jun-2009	Marko Zec <zec@FreeBSD.org>	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)
# 1abcdbd1	30-May-2009	Attilio Rao <attilio@FreeBSD.org>	When user_frac in the polling subsystem is low it is going to busy the CPU for too long period than necessary. Additively, interfaces are kept polled (in the tick) even if no more packets are available. In order to avoid such situations a new generic mechanism can be implemented in proactive way, keeping track of the time spent on any packet and fragmenting the time for any tick, stopping the processing as soon as possible. In order to implement such mechanism, the polling handler needs to change, returning the number of packets processed. While the intended logic is not part of this patch, the polling KPI is broken by this commit, adding an int return value and the new flag IFCAP_POLLING_NOCOUNT (which will signal that the return value is meaningless for the installed handler and checking should be skipped). Bump __FreeBSD_version in order to signal such situation. Reviewed by: emaste Sponsored by: Sandvine Incorporated
# e0c14af9	22-May-2009	Marko Zec <zec@FreeBSD.org>	Introduce the if_vmove() function, which will be used in the future for reassigning ifnets from one vnet to another. if_vmove() works by calling a restricted subset of actions normally executed by if_detach() on an ifnet in the current vnet, and then switches to the target vnet and executes an appropriate subset of if_attach() actions there. if_attach() and if_detach() have become wrapper functions around if_attach_internal() and if_detach_internal(), where the later variants have an additional argument, a flag indicating whether a full attach or detach sequence is to be executed, or only a restricted subset suitable for moving an ifnet from one vnet to another. Hence, if_vmove() will not call if_detach() and if_attach() directly, but will call the if_detach_internal() and if_attach_internal() variants instead, with the vmove flag set. While here, staticize ifnet_setbyindex() since it is not referenced from outside of sys/net/if.c. Also rename ifccnt field in struct vimage to ifcnt, and do some minor whitespace garbage collection where appropriate. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, rwatson, brooks? Approved by: julian (mentor)
# 21ca7b57	05-May-2009	Marko Zec <zec@FreeBSD.org>	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
# f6dfe47a	30-Apr-2009	Marko Zec <zec@FreeBSD.org>	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)
# 6064c5d3	23-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Add ifunit_ref(), a version of ifunit(), that returns not just an interface pointer, but also a reference to it. Modify ifioctl() to use ifunit_ref(), holding the reference until all ioctls, etc, have completed. This closes a class of reader-writer races in which interfaces could be removed during long-running ioctls, leading to crashes. Many other consumers of ifunit() should now use ifunit_ref() to avoid similar races. MFC after: 3 weeks
# 111c6b61	23-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	During if_detach(), invoke if_dead() to set the ifnet's function pointers to "dead" implementations that no-op rather than invoking the device driver. This would generally be unexpected and possibly quite badly handled by most device drivers after if_detach() has completed. Reviewed by: bms MFC after: 3 weeks
# 27d37320	21-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Start to address a number of races relating to use of ifnet pointers after the corresponding interface has been destroyed: (1) Add an ifnet refcount, ifp->if_refcount. Initialize it to 1 in if_alloc(), and modify if_free_type() to decrement and check the refcount. (2) Add new if_ref() and if_rele() interfaces to allow kernel code walking global interface lists to release IFNET_[RW]LOCK() yet keep the ifnet stable. Currently, if_rele() is a no-op wrapper around if_free(), but this may change in the future. (3) Add new ifnet field, if_alloctype, which caches the type passed to if_alloc(), but unlike if_type, won't be changed by drivers. This allows asynchronous free's of the interface after the driver has released it to still use the right type. Use that instead of the type passed to if_free_type(), but assert that they are the same (might have to rethink this if that doesn't work out). (4) Add a new ifnet_byindex_ref(), which looks up an interface by index and returns a reference rather than a pointer to it. (5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking up the ifnet to global lists. (6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release the ifnet when done. When this change is MFC'd, it will need to replace if_ispare fields rather than adding new fields in order to avoid breaking the binary interface. Once this change is MFC'd, if_free_type() should be removed, as its 'type' argument is now optional. This refcount is not appropriate for counting mbuf pkthdr references, and also not for counting entry into the device driver via ifnet function pointers. An rmlock may be appropriate for the latter. Rather, this is about ensuring data structure stability when reaching an ifnet via global ifnet lists and tables followed by copy in or out of userspace. MFC after: 3 weeks Reported by: mdtancsa Reviewed by: brooks
# 7cc5b47f	16-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	export if_qflush for use by driver if_qflush routines only set ifp->if_{transmit, qflush} if not already set KASSERT that neither or both are set
# 279aa3d4	16-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson
# 3efea337	13-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	Adapt buf_ring abstraction interface to allow consumers to interoperate with ALTQ
# e5adda3d	15-Mar-2009	Robert Watson <rwatson@FreeBSD.org>	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@
# 3055e123	28-Feb-2009	Robert Watson <rwatson@FreeBSD.org>	Do a bit of struct ifnet cleanup in preparation for 8.0: group function pointers together, move padding to the bottom of the structure, and add two new integer spares due to attrition over time. Remove unused spare "flags" field, we can use one of the spare ints if we need it later. This change requires a rebuild of device driver modules that depend on the layout of ifnet for binary compatibility reasons. Discussed with: kmacy
# 64c44e5d	17-Dec-2008	Kip Macy <kmacy@FreeBSD.org>	Keep stats in drbr_enqueue Discussed with: ps
# 1635d917	16-Dec-2008	Kip Macy <kmacy@FreeBSD.org>	merge in 2 buf_ring helper routines for enqueueing and freeing buf_rings
# 991f8615	16-Dec-2008	Kip Macy <kmacy@FreeBSD.org>	convert ifnet and afdata locks from mutexes to rwlocks
# 6e6b3f7c	14-Dec-2008	Qing Li <qingli@FreeBSD.org>	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion
# 1b193af6	13-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation
# 4b79449e	02-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
# db7f0b97	21-Nov-2008	Kip Macy <kmacy@FreeBSD.org>	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 8b615593	02-Oct-2008	Marko Zec <zec@FreeBSD.org>	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
# 516993d4	19-Aug-2008	Andrew Thompson <thompsa@FreeBSD.org>	ifnet_setbyindex() is only used locally, go back to being static.
# 1887d35f	19-Aug-2008	Kip Macy <kmacy@FreeBSD.org>	Fix build
# 02f4879d	26-Jun-2008	Robert Watson <rwatson@FreeBSD.org>	Introduce locking around use of ifindex_table, whose use was previously unsynchronized. While races were extremely rare, we've now had a couple of reports of panics in environments involving large numbers of IPSEC tunnels being added very quickly on an active system. - Add accessor functions ifnet_byindex(), ifaddr_byindex(), ifdev_byindex() to replace existing accessor macros. These functions now acquire the ifnet lock before derefencing the table. - Add IFNET_WLOCK_ASSERT(). - Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(), which set values in the table either asserting of acquiring the ifnet lock. - Use accessor functions throughout if.c to modify and read ifindex_table. - Rework ifnet attach/detach to lock around ifindex_table modification. Note that these changes simply close races around use of ifindex_table, and make no attempt to solve the probem of disappearing ifnets. Further refinement of this work, including with respect to ifindex_table resizing, is still required. In a future change, the ifnet lock should be converted from a mutex to an rwlock in order to reduce contention. Reviewed and tested by: brooks
# 8b07e49a	09-May-2008	Julian Elischer <julian@FreeBSD.org>	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
# fb27dd1d	25-Mar-2008	Sam Leffler <sam@FreeBSD.org>	expose if_purgemaddrs, it will be used by the vap code unless someone redesigns the mcast support code in the next few weeks MFC after: 3 weeks
# 2de2af32	06-Dec-2007	Kip Macy <kmacy@FreeBSD.org>	Add padding for anticipated functionality - vimage - TOE - multiq - host rtentry caching Rename spare used by 80211 to if_llsoftc Reviewed by: rwatson, gnn MFC after: 1 day
# bec59525	16-May-2007	Brooks Davis <brooks@FreeBSD.org>	The struct if_data members ifi_recvquota and ifi_xmitquota have been unused for ages. Rename them to ifi_spare_char1 and ifi_spare_char2 respectively to indicate this face.
# 18242d3b	16-Apr-2007	Andrew Thompson <thompsa@FreeBSD.org>	Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking. The name trunk is misused as the networking term trunk means carrying multiple VLANs over a single connection. The IEEE standard for link aggregation (802.3 section 3) does not talk about 'trunk' at all while it is used throughout IEEE 802.1Q in describing vlans. The lagg(4) driver provides link aggregation, failover and fault tolerance. Discussed on: current@
# b47888ce	09-Apr-2007	Andrew Thompson <thompsa@FreeBSD.org>	Add the trunk(4) driver for providing link aggregation, failover and fault tolerance. This driver allows aggregation of multiple network interfaces as one virtual interface using a number of different protocols/algorithms. failover - Sends traffic through the secondary port if the master becomes inactive. fec - Supports Cisco Fast EtherChannel. lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol (LACP) and the Marker Protocol. loadbalance - Static loadbalancing using an outgoing hash. roundrobin - Distributes outgoing traffic using a round-robin scheduler through all active ports. This code was obtained from OpenBSD and this also includes 802.3ad LACP support from agr(4) in NetBSD.
# 5896d124	19-Mar-2007	Bruce M Simpson <bms@FreeBSD.org>	Fix tinderbox; ng_ether needs to see if_findmulti().
# ec002fee	19-Mar-2007	Bruce M Simpson <bms@FreeBSD.org>	Implement reference counting for ifmultiaddr, in_multi, and in6_multi structures. Detect when ifnet instances are detached from the network stack and perform appropriate cleanup to prevent memory leaks. This has been implemented in such a way as to be backwards ABI compatible. Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti() is unable to detect interface removal by design, as it performs searches on structures which are removed with the interface. With this architectural change, the panics FreeBSD users have experienced with carp and pfsync should be resolved. Obtained from: p4 branch bms_netdev Reviewed by: andre Sponsored by: Garance A Drosehn Idea from: NetBSD MFC after: 1 month
# 60d4ab7a	06-Sep-2006	Andre Oppermann <andre@FreeBSD.org>	Improve description of if_capabilities, if_capenable and ifi_hwassist. Sponsored by: TCP/IP Optimization Fundraise 2005
# 773725a2	06-Sep-2006	Andre Oppermann <andre@FreeBSD.org>	Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing. Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
# 43bc7a9c	04-Aug-2006	Brooks Davis <brooks@FreeBSD.org>	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.
# 0dad3f0e	19-Jun-2006	Max Laier <mlaier@FreeBSD.org>	Import interface groups from OpenBSD. This allows to group interfaces in order to - for example - apply firewall rules to a whole group of interfaces. This is required for importing pf from OpenBSD 3.9 Obtained from: OpenBSD (with changes) Discussed on: -net (back in April)
# 75ee267c	30-Jan-2006	Gleb Smirnoff <glebius@FreeBSD.org>	Merge the //depot/user/yar/vlan branch into CVS. It contains some collective work by yar, thompsa and myself. The checksum offloading part also involves work done by Mihail Balikov. The most important changes: o Instead of global linked list of all vlan softc use a per-trunk hash. The size of hash is dynamically adjusted, depending on number of entries. This changes struct ifnet, replacing counter of vlans with a pointer to trunk structure. This change is an improvement for setups with big number of VLANs, several interfaces and several CPUs. It is a small regression for a setup with a single VLAN interface. An alternative to dynamic hash is a per-trunk static array with 4096 entries, which is a compile time option - VLAN_ARRAY. In my experiments the array is not an improvement, probably because such a big trunk structure doesn't fit into CPU cache. o Introduce an UMA zone for VLAN tags. Since drivers depend on it, the zone is declared in kern_mbuf.c, not in optional vlan(4) driver. This change is a big improvement for any setup utilizing vlan(4). o Use rwlock(9) instead of mutex(9) for locking. We are the first ones to do this! :) o Some drivers can do hardware VLAN tagging + hardware checksum offloading. Add an infrastructure for this. Whenever vlan(4) is attached to a parent or parent configuration is changed, the flags on vlan(4) interface are updated. In collaboration with: yar, thompsa In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>
# 4a0d6638	11-Nov-2005	Ruslan Ermilov <ru@FreeBSD.org>	- Store pointer to the link-level address right in "struct ifnet" rather than in ifindex_table[]; all (except one) accesses are through ifp anyway. IF_LLADDR() works faster, and all (except one) ifaddr_byindex() users were converted to use ifp->if_addr. - Stop storing a (pointer to) Ethernet address in "struct arpcom", and drop the IFP2ENADDR() macro; all users have been converted to use IF_LLADDR() instead.
# 4e7e0183	08-Nov-2005	Andrew Thompson <thompsa@FreeBSD.org>	Move the cloned interface list management in to if_clone. For some drivers the softc lists and associated mutex are now unused so these have been removed. Calling if_clone_detach() will now destroy all the cloned interfaces for the driver and in most cases is all thats needed to unload. Idea by: brooks Reviewed by: brooks
# 40929967	01-Oct-2005	Gleb Smirnoff <glebius@FreeBSD.org>	Big polling(4) cleanup. o Axe poll in trap. o Axe IFF_POLLING flag from if_flags. o Rework revision 1.21 (Giant removal), in such a way that poll_mtx is not dropped during call to polling handler. This fixes problem with idle polling. o Make registration and deregistration from polling in a functional way, insted of next tick/interrupt. o Obsolete kern.polling.enable. Polling is turned on/off with ifconfig. Detailed kern_poll.c changes: - Remove polling handler flags, introduced in 1.21. The are not needed now. - Forget and do not check if_flags, if_capenable and if_drv_flags. - Call all registered polling handlers unconditionally. - Do not drop poll_mtx, when entering polling handlers. - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx. - In netisr_poll() axe the block, where polling code asks drivers to unregister. - In netisr_poll() and ether_poll() do polling always, if any handlers are present. - In ether_poll_[de]register() remove a lot of error hiding code. Assert that arguments are correct, instead. - In ether_poll_[de]register() use standard return values in case of error or success. - Introduce poll_switch() that is a sysctl handler for kern.polling.enable. poll_switch() goes through interface list and enabled/disables polling. A message that kern.polling.enable is deprecated is printed. Detailed driver changes: - On attach driver announces IFCAP_POLLING in if_capabilities, but not in if_capenable. - On detach driver calls ether_poll_deregister() if polling is enabled. - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING flag. If there is no, then unlocks and returns. - In ioctl handler driver checks for IFCAP_POLLING flag requested to be set or cleared. Driver first calls ether_poll_[de]register(), then obtains driver lock and [dis/en]ables interrupts. - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable. If present, then returns.This is important to protect from spurious interrupts. Reviewed by: ru, sam, jhb
# 292ee7be	09-Aug-2005	Robert Watson <rwatson@FreeBSD.org>	Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE, and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making and documenting the locking of these flags the responsibility of the device driver, not the network stack. The flags for these two fields will be mutually exclusive so that they can be exposed to user space as though they were stored in the same variable. Provide #defines to provide the old names #ifndef _KERNEL, so that user applications (such as ifconfig) can use the old flag names. Using the old names in a device driver will result in a compile error in order to help device driver writers adopt the new model. When exposing the interface flags to user space, via interface ioctls or routing sockets, or the two fields together. Since the driver flags cannot currently be set for user space, no new logic is currently required to handle this case. Add some assertions that general purpose network stack routines, such as if_setflags(), are not improperly used on driver-owned flags. With this change, a large number of very minor network stack races are closed, subject to correct device driver locking. Most were likely never triggered. Driver sweep to follow; many thanks to pjd and bz for the line-by-line review they gave this patch. Reviewed by: pjd, bz MFC after: 7 days
# c3b31afd	02-Aug-2005	Robert Watson <rwatson@FreeBSD.org>	Protect link layer network interface multicast address list manipulation using ifp->if_addr_mtx: - Initialize if_addr_mtx when ifnet is initialized. - Destroy if_addr_mtx when ifnet is torn down. - Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx. Staticize. - Extract ifmultiaddr allocation and initialization into if_allocmulti(); accept a 'mflags' argument to indicate whether or not sleeping is permitted. This centralizes error handling and address duplication. - Extract ifmultiaddr tear-down and deallocation in if_freemulti(). - Re-structure if_addmulti() to hold if_addr_mtx around manipulation of the ifnet multicast address list and reference count manipulation. Make use of non-sleeping allocations. Annotate the fact that we only generate routing socket events for explicit address addition, not implicit link layer address addition. - Re-structure if_delmulti() to hold if_addr_mtx around manipulation of the ifnet multicast address list and reference count manipulation. Annotate the lack of a routing socket event for implicit link layer address removal. - De-spl all and sundry. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 1 week
# de6073aa	02-Aug-2005	Robert Watson <rwatson@FreeBSD.org>	Add if_addr_mtx to struct ifnet, a mutex to protect ifnet-related address lists. Add accessor macros. This changes the size of struct ifnet, but ideally, all ifnet consumers are now using if_alloc() to allocate these structures rather than embedding them into device driver softc's, so this won't modify the network device driver ABI. MFC after: 1 week
# 638ccea0	21-Jul-2005	Robert Watson <rwatson@FreeBSD.org>	Allocate one of the spare ifnet integer fields to hold if_drv_flags, which in the future will hold IFF_OACTIVE and IFF_RUNNING, and have its access synchronized by the device driver rather than the protocol stack. This will avoid potential races in the management of flags in if_flags. Discussed with: various (scottl, jhb, ...) MFC after: 1 week
# fc74a9f9	10-Jun-2005	Brooks Davis <brooks@FreeBSD.org>	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam
# 8f867517	04-Jun-2005	Andrew Thompson <thompsa@FreeBSD.org>	Add hooks into the networking layer to support if_bridge. This changes struct ifnet so a buildworld is necessary. Approved by: mlaier (mentor) Obtained from: NetBSD
# 45778b37	25-May-2005	Peter Edwards <peadar@FreeBSD.org>	Separate out address-detaching part of if_detach into if_purgeaddrs, so if_tap doesn't need to rely on locally-rolled code to do same. The observable symptom of if_tap's bzero'ing the address details was a crash in "ifconfig tap0" after an if_tap device was closed. Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)
# 68a3482f	20-Apr-2005	Gleb Smirnoff <glebius@FreeBSD.org>	Do not call all link state callbacks directly, but schedule a taskqueue(9) task. This fixes LORs and adds possibility to serve such events pseudorecursively, when link state change of interface causes subsequent change on other interfaces. Sponsored by: Rambler Reviewed by: sam, brooks, mux
# 3a84d72a	01-Mar-2005	Gleb Smirnoff <glebius@FreeBSD.org>	Revert change to struct ifnet. Use ifnet pointer in softc. Embedding ifnet into smth will soon be removed. Requested by: brooks
# e8c34a71	26-Feb-2005	Gleb Smirnoff <glebius@FreeBSD.org>	Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet. Obtained from: OpenBSD
# a9771948	22-Feb-2005	Gleb Smirnoff <glebius@FreeBSD.org>	Add CARP (Common Address Redundancy Protocol), which allows multiple hosts to share an IP address, providing high availability and load balancing. Original work on CARP done by Michael Shalayeff, with many additions by Marco Pfatschbacher and Ryan McBride. FreeBSD port done solely by Max Laier. Patch by: mlaier Obtained from: OpenBSD (mickey, mcbride)
# c398230b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# 94f5c9cf	07-Dec-2004	Sam Leffler <sam@FreeBSD.org>	Cleanup link state change notification: o add new if_link_state_change routine that deals with link state changes o change mii to use if_link_state_change
# 0b39ef4d	09-Nov-2004	Max Laier <mlaier@FreeBSD.org>	Remove the #if 0 wrapping around !ALTQ stuff that can't be used due to ABI stability anyway.
# 0b762445	30-Oct-2004	Robert Watson <rwatson@FreeBSD.org>	Move if_handoff() from an inline in if_var.h to a function to if.c in orden to harden the ABI for 5.x; this will permit us to modify the locking in the ifnet packet dispatch without requiring drivers to be recompiled. MFC after: 3 days Discussed at: EuroBSDCon Developer's Summit
# b4d4574a	30-Oct-2004	Robert Watson <rwatson@FreeBSD.org>	Add additional "spare" fields to 'struct ifnet' in order to improve the resistance of the network driver ABI to changes that will be required as we optimize locking. MFC after: 3 days Discussed at: Developer Summit
# 2f27e151	25-Oct-2004	John-Mark Gurney <jmg@FreeBSD.org>	use NULL instead of 0 when casting/comparing w/ a pointer...
# 31302ebf	19-Oct-2004	Robert Watson <rwatson@FreeBSD.org>	Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally acquire Giant if the passed interface has IFF_NEEDSGIANT set on it. Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order to ensure that Giant is held. MFC after: 3 days Bumped into by: jmg
# ad3b9257	15-Aug-2004	John-Mark Gurney <jmg@FreeBSD.org>	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
# de0332d4	07-Aug-2004	Max Laier <mlaier@FreeBSD.org>	Add a "void *if_carp" placeholder to struct ifnet with prospect to bring in the "Common address redundancy protocol" (CARP) during the 5-STABLE cycle. Hence doing the ABI break now. Approved by: re (scottl)
# af5e59bf	27-Jul-2004	Robert Watson <rwatson@FreeBSD.org>	Add a new network interface flag, IFF_NEEDSGIANT, which will allow device drivers to declare that the ifp->if_start() method implemented by the driver requires Giant in order to operate correctly. Add a 'struct task' to 'struct ifnet' that can be used to execute a deferred ifp->if_start() in the event that if_start needs to be called in a Giant-free environment. To do this, introduce if_start(), a wrapper function for ifp->if_start(). If the interface can run MPSAFE, it directly dispatches into the interface start routine. If it can't run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't currently held, the task is queued to execute in a swi holding Giant via if_start_deferred(). Modify if_handoff() to use if_start() instead of direct dispatch. Modify 802.11 to use if_start() instead of direct dispatch. This is intended to provide increased compatibility for non-MPSAFE network device drivers in the presence of Giant-free operation via asynchronous dispatch. However, this commit does not mark any network interfaces as IFF_NEEDSGIANT.
# bfe46415	14-Jul-2004	Max Laier <mlaier@FreeBSD.org>	Fix a copy-and-paste-o in IFQ_DRV_PREPEND - all pointyhats to me. While here also fix a (not less stupid) braino in IFQ_DRV_PURGE. Reported-by: clement Tested-by: clement (_PREPEND in sis(4))
# f889d2ef	22-Jun-2004	Brooks Davis <brooks@FreeBSD.org>	Major overhaul of pseudo-interface cloning. Highlights include: - Split the code out into if_clone.[ch]. - Locked struct if_clone. [1] - Add a per-cloner match function rather then simply matching names of the form <name><unit> and <name>. - Use the match function to allow creation of <interface>.<tag> vlan interfaces. The old way is preserved unchanged! - Also the match function to allow creation of stf(4) interfaces named stf0, stf, or 6to4. This is the only major user visible change in that "ifconfig stf" creates the interface stf rather then stf0 and does not print "stf0" to stdout. - Allow destroy functions to fail so they can refuse to delete interfaces. Currently, we forbid the deletion of interfaces which were created in the init function, particularly lo0, pflog0, and pfsync0. In the case of lo0 this was a panic implementation so it does not count as a user visiable change. :-) - Since most interfaces do not need the new functionality, an family of wrapper functions, ifc_simple_*(), were created to wrap old style cloner functions. - The IF_CLONE_INITIALIZER macro is replaced with a new incompatible IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE instead. Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1] Reviewed by: andre, mlaier Discussed on: net
# 89c9c53d	16-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
# 62d7f46e	14-Jun-2004	Max Laier <mlaier@FreeBSD.org>	Fix a typeo in IFQ_HANDOFF.
# 4cb655c0	14-Jun-2004	Max Laier <mlaier@FreeBSD.org>	Transform tbr_dequeue into a function pointer in order to build drivers with ALTQ enabled versions of IFQ_* macros by default, as requested by serveral others. This is a follow-up to the quick fix I committed yesterday which turned off the ALTQ checks for non-ALTQ kernels.
# 930e2cfa	13-Jun-2004	Max Laier <mlaier@FreeBSD.org>	Unbreak non-ALTQ kernel linking. I forgot about tbr_dequeue. In the end drivers should be building with ALTQ checks by default, but for now build them with the old macros for non-ALTQ kernels. Note: Check new features w/ LINT and w/ LINT minus the new feature. Found-by: rwatson
# 02b199f1	13-Jun-2004	Max Laier <mlaier@FreeBSD.org>	Link ALTQ to the build and break with ABI for struct ifnet. Please recompile your (network) modules as well as any userland that might make sense of sizeof(struct ifnet). This does not change the queueing yet. These changes will follow in a seperate commit. Same with the driver changes, which need case by case evaluation. __FreeBSD_version bump will follow. Tested-by: (i386)LINT
# 127d7b2d	03-May-2004	Andre Oppermann <andre@FreeBSD.org>	Link state change notification of ethernet media to the routing socket. o Extend the if_data structure with an ifi_link_state field and provide the corresponding defines for the valid states. o The mii_linkchg() callback updates the ifi_link_state field and calls rt_ifmsg() to notify listeners on the routing socket in addition to the kqueue KNOTE. o If vlans are configured on a physical interface notify and update all vlan pseudo devices as well with the vlan_link_state() callback. No objections by: sam, wpaul, ru, bms Brucification by: bde
# 8614fb12	18-Apr-2004	Max Laier <mlaier@FreeBSD.org>	Make if_(un)route static in if.c as they are called from if_up/if_down only. This is also cleanup to make locking easier. Reviewed by: luigi Approved by: bms(mentor)
# 212b6d52	17-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	+ rename and document an unused field in struct arpcom (field is still there so there are no ABI changes); + replace 5 redefinitions of the IPF2AC macro with one in if_arp.h Eventually (but before freezing the ABI) we need to get rid of struct arpcom (initially with the help of some smart #defines to avoid having to touch each and every driver, see below). Apart from the struct ifnet, struct arpcom now only stores a copy of the MAC address (ac_enaddr, but we already have another copy in the struct ifnet -- if_addrhead), and a netgraph-specific field which is _always_ accessed through the ifp, so it might well go into the struct ifnet too (where, besides, there is already an entry for AF_NETGRAPH data...) Too bad ac_enaddr is widely referenced by all drivers. But this can be fixed as follows: #define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet (note that the right hand side would likely be a pointer rather than the base address of an array.)
# d65d2351	16-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	Documented the intended usage of if_addrhead and ifaddr_byindex() This commit only changes comments. Nothing to recompile.
# 621b79c4	15-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	Document the way if_addrhead and struct ifaddr are used. Remove a member from 'struct ifaddr' which has been in an #ifdef notdef block since rev 1.1 No ABI changes -- no need to recompile anything.
# 307c58e2	12-Apr-2004	Ruslan Ermilov <ru@FreeBSD.org>	Count outgoing link-level broadcast packets in if_omcasts. I'm not sure this is completely correct but at least this is consistent with the accounting of incoming broadcasts. PR: kern/65273 Submitted by: David J Duchscher <daved@tamu.edu>
# 41a76b48	11-Apr-2004	Robert Watson <rwatson@FreeBSD.org>	In 4.x, if_ipending is used to track network interrupt state. In 5.x, it is no longer used, so GC the ifnet.if_ipending field.
# f36cfd49	07-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
# f7c5baa1	03-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	+ arpresolve(): remove an unused argument + struct ifnet: remove unused fields, move ipv6-related field close to each other, add a pointer to l3<->l2 translation tables (arp,nd6, etc.) for future use. + struct route: remove an unused field, move close to each other some fields that might likely go away in the future
# 196f7f54	12-Mar-2004	Brooks Davis <brooks@FreeBSD.org>	Remove if_withname. It came in with the KAME import, but never got used. Should someone need its functionality, it's a really expensive implementation of: ifnet_byindex(sdl->sdl_index) Reviewed by: bde, ume
# 25a4adce	25-Feb-2004	Max Laier <mlaier@FreeBSD.org>	Bring eventhandler callbacks for pf. This enables pf to track dynamic address changes on interfaces (dailup) with the "on (<ifname>)"-syntax. This also brings hooks in anticipation of tracking cloned interfaces, which will be in future versions of pf. Approved by: bms(mentor)
# 2ab763ec	06-Dec-2003	Warner Losh <imp@FreeBSD.org>	Make the if_broadcastaddr const. All the drivers in the tree which violated the constness were corrected before the freeze. This was suggested by mdodd@, I think, and sam@ and others have signed off on this if I recall my conversations with them correctly.
# eca8a663	11-Nov-2003	Robert Watson <rwatson@FreeBSD.org>	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 9bf40ede	31-Oct-2003	Brooks Davis <brooks@FreeBSD.org>	Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance. This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics. Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)
# 234a35c7	24-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot use mutex lock directly here. Protect ifp->if_afdata instead. Reported by: grehan
# 31b1bfe1	17-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	- add dom_if{attach,detach} framework. - transition to use ifp->if_afdata. Obtained from: KAME
# 9d5abbdd	01-Jan-2003	Jens Schweikhardt <schweikh@FreeBSD.org>	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.
# d64ada50	30-Dec-2002	Jens Schweikhardt <schweikh@FreeBSD.org>	Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/ Add FreeBSD Id tag where missing.
# c919b1e2	26-Dec-2002	Jeffrey Hsu <hsu@FreeBSD.org>	Long chain of calls starting with bridge_on(), going through IPv6, and ending up at ifa_ifwithdstaddr() could lead to a recursive lock of the ifnet list mutex.
# b30a244c	21-Dec-2002	Jeffrey Hsu <hsu@FreeBSD.org>	SMP locking for ifnet list.
# 68eec1f8	17-Dec-2002	Jeffrey Hsu <hsu@FreeBSD.org>	Switch to the conventional reference counting scheme.
# 19fc74fb	18-Dec-2002	Jeffrey Hsu <hsu@FreeBSD.org>	Lock up ifaddr reference counts.
# 76cfd300	14-Nov-2002	Sam Leffler <sam@FreeBSD.org>	o add if_nvlans member to track the number of vlans active on an interface o add if_input member for interface drivers to call through to pass packets "up" o remove ethernet-specific function decls (moved to ethernet.h) Reviewed by: many Approved by: re
# a7f62430	28-Sep-2002	Bruce Evans <bde@FreeBSD.org>	Fixed some of the namespace pollution in rev.1.33. <sys/systm.h> was included here because it was once a prerequisite of <sys/mutex.h> although that bug was fixed long ago.
# fa882e87	24-Sep-2002	Brooks Davis <brooks@FreeBSD.org>	Add a new helper function if_printf() modeled on device_printf(). The function takes a struct ifnet pointer followed by the usual printf arguments and prints "<interfacename>: " before the results of printf. Since this is the primary form of printf calls in network device drivers and accounts for most uses of the ifnet menber if_unit, this significantly simplifies many printf()s.
# 62f76486	18-Aug-2002	Maxim Sobolev <sobomax@FreeBSD.org>	Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's. Reviewed by: -hackers, -net
# c44d8405	13-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Move to nested include of _label.h instead of mac.h, reducing namespace pollution. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs Suggested by: bde
# 19930ae5	30-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Introduce support for Mandatory Access Control and extensible kernel access control. Label network interface structures, permitting security features to be maintained on those objects. if_label will be used to authorize data flow using the network interface. if_label will be protected using the same synchronization primitives as other mutable entries in struct ifnet. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# f0a8d5cb	07-May-2002	Warner Losh <imp@FreeBSD.org>	Minor style nit
# 929ddbbb	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# effa274e	14-Dec-2001	Jonathan Lemon <jlemon@FreeBSD.org>	whitespace fixes.
# e4fc250c	14-Dec-2001	Luigi Rizzo <luigi@FreeBSD.org>	Device Polling code for -current. Non-SMP, i386-only, no polling in the idle loop at the moment. To use this code you must compile a kernel with options DEVICE_POLLING and at runtime enable polling with sysctl kern.polling.enable=1 The percentage of CPU reserved to userland can be set with sysctl kern.polling.user_frac=NN (default is 50) while the remainder is used by polling device drivers and netisr's. These are the only two variables that you should need to touch. There are a few more parameters in kern.polling but the default values are adequate for all purposes. See the code in kern_poll.c for more details on them. Polling in the idle loop will be implemented shortly by introducing a kernel thread which does the job. Until then, the amount of CPU dedicated to polling will never exceed (100-user_frac). The equivalent (actually, better) code for -stable is at http://info.iet.unipi.it/~luigi/polling/ and also supports polling in the idle loop. NOTE to Alpha developers: There is really nothing in this code that is i386-specific. If you move the 2 lines supporting the new option from sys/conf/{files,options}.i386 to sys/conf/{files,options} I am pretty sure that this should work on the Alpha as well, just that I do not have a suitable test box to try it. If someone feels like trying it, I would appreciate it. NOTE to other developers: sure some things could be done better, and as always I am open to constructive criticism, which a few of you have already given and I greatly appreciated. However, before proposing radical architectural changes, please take some time to possibly try out this code, or at the very least read the comments in kern_poll.c, especially re. the reason why I am using a soft netisr and cannot (I believe) replace it with a simple timeout. Quick description of files touched by this commit: sys/conf/files.i386 new file kern/kern_poll.c sys/conf/options.i386 new option sys/i386/i386/trap.c poll in trap (disabled by default) sys/kern/kern_clock.c initialization and hardclock hooks. sys/kern/kern_intr.c minor swi_net changes sys/kern/kern_poll.c the bulk of the code. sys/net/if.h new flag sys/net/if_var.h declaration for functions used in device drivers. sys/net/netisr.h NETISR_POLL sys/dev/fxp/if_fxp.c sys/dev/fxp/if_fxpvar.h sys/pci/if_dc.c sys/pci/if_dcreg.h sys/pci/if_sis.c sys/pci/if_sisreg.h device driver modifications
# 985fbf6b	22-Nov-2001	Luigi Rizzo <luigi@FreeBSD.org>	Expand the comment on the layout of softc, arpcom and ifnet structures, and list the places where the assumption is used.
# 99efe4f0	14-Nov-2001	John Baldwin <jhb@FreeBSD.org>	Remove ifnet.if_mpsafe for now. If this is needed, it won't be needed until much later when the network stack locking is farther along. Approved by: jlemon
# 8071913d	17-Oct-2001	Ruslan Ermilov <ru@FreeBSD.org>	Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2. Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo '' as the argument. Pass rt_addrinfo all the way down to rtrequest1 and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now ``rt_addrinfo '' instead of ``sockaddr '' (almost noone is using it anyways). Benefit: the following command now works. Previously we needed two route(8) invocations, "add" then "change". # route add -inet6 default ::1 -ifp gif0 Remove unsafe typecast in rtrequest(), from ``rtentry '' to ``sockaddr *''. It was introduced by 4.3BSD-Reno and never corrected. Obtained from: BSD/OS, NetBSD MFC after: 1 month PR: kern/28360
# 322dcb8d	14-Oct-2001	Max Khon <fjoe@FreeBSD.org>	bring in ARP support for variable length link level addresses Reviewed by: jdp Approved by: jdp Obtained from: NetBSD MFC after: 6 weeks
# 8ec410b5	02-Oct-2001	Matt Jacob <mjacob@FreeBSD.org>	Documentation comment: note that the each NIC's softc is assumed to start with an ifnet structure. MFC after: 1 week
# 016da741	18-Sep-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Add two fields to the ifnet structure indicating what extra capabilities a network device has, and which ones are enabled.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# f9132ceb	05-Sep-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Wrap array accesses in macros, which also happen to be lvalues: ifnet_addrs[i - 1] -> ifaddr_byindex(i) ifindex2ifnet[i] -> ifnet_byindex(i) This is intended to ease the conversion to SMPng.
# 30aad87d	02-Jul-2001	Brooks Davis <brooks@FreeBSD.org>	Add kernel infrastructure for network device cloning. Reviewed by: ru, ume Obtained from: NetBSD MFC after: 1 week
# f34fa851	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 6817526d	06-Feb-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Convert if_multiaddrs from LIST to TAILQ so that it can be traversed backwards in the three drivers which want to do that. Reviewed by: mikeh
# 90d9802f	29-Jan-2001	Peter Wemm <peter@FreeBSD.org>	Make the number of loopback interfaces dynamically tunable. Why one would want to is a different story, but it used to be able to be done statically. Get rid of #include "loop.h" and struct ifnet loif[NLOOP]; This could be used as an example of how to do this in other drivers, for example: ccd.
# 9ba20c31	26-Nov-2000	Jonathan Lemon <jlemon@FreeBSD.org>	Unbreak world; #include <sys/mutex.h> instead of <machine/mutex.h> Only include <sys/mbuf.h> when building kernel sources. This should probably be changed to require callers to include it themselves.
# df5e1987	25-Nov-2000	Jonathan Lemon <jlemon@FreeBSD.org>	Lock down the network interface queues. The queue mutex must be obtained before adding/removing packets from the queue. Also, the if_obytes and if_omcasts fields should only be manipulated under protection of the mutex. IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on the queue. An IF_LOCK macro is provided, as well as the old (mutex-less) versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which needs them, but their use is discouraged. Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF, which takes care of locking/enqueue, and also statistics updating/start if necessary.
# 5da9f8fa	19-Oct-2000	Josef Karthauser <joe@FreeBSD.org>	Augment the 'ifaddr' structure with a 'struct if_data' to keep statistics on a per network address basis. Teach the IPv4 and IPv6 input/output routines to log packets/bytes against the network address connected to the flow. Teach netstat to display the per-address stats for IP protocols when 'netstat -i' is evoked, instead of displaying the per-interface stats.
# 66ce51ce	14-Aug-2000	Archie Cobbs <archie@FreeBSD.org>	Export the functionality of SIOCSIFLLADDR with if_setlladdr() and add some more rigorous sanity checking in the process. Reviewed by: freebsd-net
# 21b8ebd9	13-Jul-2000	Archie Cobbs <archie@FreeBSD.org>	Make all Ethernet drivers attach using ether_ifattach() and detach using ether_ifdetach(). The former consolidates the operations of if_attach(), ng_ether_attach(), and bpfattach(). The latter consolidates the corresponding detach operations. Reviewed by: julian, freebsd-net
# 6ec86086	29-Jun-2000	Archie Cobbs <archie@FreeBSD.org>	Fix kernel build breakage when 'device ether' was not included.
# e1e1452d	26-Jun-2000	Archie Cobbs <archie@FreeBSD.org>	Make the ng_ether(4) node type dynamically loadable like the rest. This means 'options NETGRAPH' is no longer necessary in order to get netgraph-enabled Ethernet interfaces. This supports loading/unloading the ng_ether.ko and attaching/detaching the Ethernet interface in any order. Add two new hooks 'upper' and 'lower' to allow access to the protocol demux engine and the raw device, respectively. This enables bridging to be defined as a netgraph node, if so desired. Reviewed by: freebsd-net@freebsd.org
# e3975643	25-May-2000	Jake Burkholder <jake@FreeBSD.org>	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
# 06a429a3	24-May-2000	Archie Cobbs <archie@FreeBSD.org>	Just need to pass the address family to if_simloop(), not the whole sockaddr.
# 740a1973	23-May-2000	Jake Burkholder <jake@FreeBSD.org>	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
# db4f9cc7	27-Mar-2000	Jonathan Lemon <jlemon@FreeBSD.org>	Add support for offloading IP/TCP/UDP checksums to NIC hardware which supports them.
# 664a31e4	28-Dec-1999	Peter Wemm <peter@FreeBSD.org>	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
# 82cd038d	21-Nov-1999	Yoshinobu Inoue <shin@FreeBSD.org>	KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet) With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
# 76429de4	05-Nov-1999	Yoshinobu Inoue <shin@FreeBSD.org>	KAME related header files additions and merges. (only those which don't affect c source files so much) Reviewed by: cvs-committers Obtained from: KAME project
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# aab3beee	06-Aug-1999	Brian Somers <brian@FreeBSD.org>	Define IF_MAXMTU and IF_MINMTU and don't allow MTUs with out-of-range values. ``comparison is always 0'' warnings are silly ! Ok'd by: wollman, dg Advised by: bde
# e3c1388b	16-May-1999	Pierre Beyssac <pb@FreeBSD.org>	PR: kern/10570 Submitted by: adrian@freebsd.org Change reference count in struct ifaddr to a u_int, to be able to handle more than 2^16 routes to the same interface. Fix suggested by Andrew Bangs <andrewb@demon.net> in PR kern/10570. Tested by <adrian@freebsd.org> and me under -current.
# dfd5dee1	06-May-1999	Peter Wemm <peter@FreeBSD.org>	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
# 6182fdbd	16-Apr-1999	Peter Wemm <peter@FreeBSD.org>	Bring the 'new-bus' to the i386. This extensively changes the way the i386 platform boots, it is no longer ISA-centric, and is fully dynamic. Most old drivers compile and run without modification via 'compatability shims' to enable a smoother transition. eisa, isapnp and pccard* are not yet using the new resource manager. Once fully converted, all drivers will be loadable, including PCI and ISA. (Some other changes appear to have snuck in, including a port of Soren's ATA driver to the Alpha. Soren, back this out if you need to.) This is a checkpoint of work-in-progress, but is quite functional. The bulk of the work was done over the last few years by Doug Rabson and Garrett Wollman. Approved by: core
# e8c2601d	16-Dec-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Generalize the if_up() and if_down() functions under the names if_route() and if_unroute(). This is first step towards sanitizing IFF_UP and IFF_RUNNING
# ed7509ac	11-Jun-1998	Julian Elischer <julian@FreeBSD.org>	Go through the loopback code with a broom.. Remove lots'o'hacks. looutput is now static. Other callers who want to use loopback to allow shortcutting should call the special entrypoint for this, if_simloop(), which is specifically designed for this purpose. Using looutput for this purpose was problematic, particularly with bpf and trying to keep track of whether one should be using the charateristics of the loopback interface or the interface (e.g. if_ethersubr.c) that was requesting the loopback. There was a whole class of errors due to this mis-use each of which had hacks to cover them up. Consists largly of hack removal :-)
# ecbb00a2	07-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# c1087c13	15-Apr-1998	Bruce Evans <bde@FreeBSD.org>	Support compiling with `gcc -ansi'.
# 7ed8f465	27-Aug-1997	Julian Elischer <julian@FreeBSD.org>	Add a per-interface-address pointer to a function that can be supplied by a protocol, to detirmine if an address matches the net this address is part of. This is needed by protocols for which netmasks "just don't work", for example appletalk. Also add the code in appletalk to make use of this new feature. Thsi fixes one of the longest standing bugs in appletalk. The inability to talk to machines to which the path is via a router which is on a different net, but the same netrange, as your interface. Protocols that do not supply this function (e.g. IP) should not be affected.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 373f88ed	08-Jan-1997	Garrett Wollman <wollman@FreeBSD.org>	Fix a few oversights in the new multicast membership interface.
# 1158dfb7	07-Jan-1997	Garrett Wollman <wollman@FreeBSD.org>	Checkpoint the beginnings of the new kernel interface for multicast group memberships. This is not actually operative at the moment (a lot of other code still needs to be changed), but this seemed like a useful reference point to check in so that others (i.e. Bill Fenner) have fair warning of where we are going.
# 19ff91c6	03-Jan-1997	Garrett Wollman <wollman@FreeBSD.org>	Separate kernel-internal data structures from exposed user interface to interfaces. (Amazing nobody had done this!) More commits to fix up user-land to follow.