Cross Reference: /freebsd-current/sys/net/netisr.c

History log of /freebsd-current/sys/net/netisr.c
Revision	Date	Author	Comments
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# 2c2b37ad	13-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	ifnet/API: Move struct ifnet definition to a <net/if_private.h> Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
# 028ecc7a	03-Sep-2022	Gordon Bergling <gbe@FreeBSD.org>	netisr(9): Fix a typo in a source code comment - s/overriden/overridden/ MFC after: 3 days
# 51f798e7	26-Jan-2022	Gleb Smirnoff <glebius@FreeBSD.org>	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268 (cherry picked from commit 6871de9363e559fef6765f0e49acc47f77544999)
# 0fa56369	03-May-2022	Marko Zec <zec@FreeBSD.org>	Revert "netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs" This reverts commit 6871de9363e559fef6765f0e49acc47f77544999. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
# 6871de93	26-Jan-2022	Gleb Smirnoff <glebius@FreeBSD.org>	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268
# 662c1305	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	net: clean up empty lines in .c and .h files
# 9033ad5f	16-May-2020	Pawel Biernacki <kaktus@FreeBSD.org>	sysctl: fix setting net.isr.dispatch during early boot Fix another collateral damage of r357614: netisr is initialised way before malloc() is available hence it can't use sysctl_handle_string() that allocates temporary buffer. Handle that internally in sysctl_netisr_dispatch_policy(). PR: 246114 Reported by: delphij Reviewed by: kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D24858
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# bacb11c9	17-Feb-2020	Hans Petter Selasky <hselasky@FreeBSD.org>	Fix kernel panic while trying to read multicast stream. When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set for all mbufs being input by the IGMP/MLD6 code. Else there will be a NULL-pointer dereference in the netisr code when trying to set the VNET based on the incoming mbuf. Add an assert to catch this when queueing mbufs on a netisr to make debugging of similar cases easier. Found by: Vladislav V. Prodan PR: 244002 Reviewed by: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies
# 977b9472	31-Jan-2020	Hans Petter Selasky <hselasky@FreeBSD.org>	Revert r357293. The netisr uses rm_ locks not rms_ locks as noted by jeff@ . Sponsored by: Mellanox Technologies
# 780c568f	29-Jan-2020	Hans Petter Selasky <hselasky@FreeBSD.org>	Widen EPOCH(9) usage in netisr. Software interrupt handlers are allowed to sleep. In swi_net() there is a read lock behind NETISR_RLOCK() which in turn ends up calling msleep() which means the whole of swi_net() cannot be protected by an EPOCH(9) section. By default the NETISR_LOCKING feature is disabled. This issue was introduced by r357004. This is a preparation step for replacing the functionality provided by r357004. Found by: kib@ Sponsored by: Mellanox Technologies
# 6ed3e187	22-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Mark swi_net() as INTR_TYPE_NET and stop entering epoch there.
# b8a6e03f	07-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111
# fb3bc596	24-May-2019	John Baldwin <jhb@FreeBSD.org>	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117
# 5f901c92	24-Jul-2018	Andrew Turner <andrew@FreeBSD.org>	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147
# fe267a55	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.
# e2a8d178	18-Feb-2017	Jason A. Harmening <jah@FreeBSD.org>	Bring back r313037, with fixes for mips: Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to replace pcpu_find(curcpu) in MI code. Reviewed by: andreast, kan, lidl Tested by: lidl(mips, sparc64), andreast(powerpc) Differential Revision: https://reviews.freebsd.org/D9587
# ad62ba6e	03-Feb-2017	Jason A. Harmening <jah@FreeBSD.org>	Revert r313037 The switch to get_pcpu() in MI code seems to cause hangs on MIPS. Back out until we can get a better idea of what's happening there. Reported by: kan, lidl
# 65ed4836	31-Jan-2017	Jason A. Harmening <jah@FreeBSD.org>	Implement get_pcpu() for the remaining architectures and use it to replace pcpu_find(curcpu) in MI code.
# fdf95c0b	17-Aug-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Teach netisr_get_cpuid() to limit a given value to supported by netisr. Use netisr_get_cpuid() in netisr_select_cpuid() to limit cpuid value returned by protocol to be sure that it is not greather than nws_count. PR: 211836 Reviewed by: adrian MFC after: 3 days
# 8c636a11	11-Jul-2016	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Remove assumptions in MI code that the BSP is CPU 0. MFC after: 2 weeks
# 484149de	03-Jun-2016	Bjoern A. Zeeb <bz@FreeBSD.org>	Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET. Add accessor functions to toggle the state per VNET. The base system (vnet0) will always enable itself with the normal registration. We will share the registered protocol handlers in all VNETs minimising duplication and management. Upon disabling netisr processing for a VNET drain the netisr queue from packets for that VNET. Update netisr consumers to (de)register on a per-VNET start/teardown using VNET_SYS(UN)INIT functionality. The change should be transparent for non-VIMAGE kernels. Reviewed by: gnn (, hiren) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6691
# fdce57a0	14-May-2016	John Baldwin <jhb@FreeBSD.org>	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix
# 8dfea464	21-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	Remove slightly used const values that can be replaced with nitems(). Suggested by: jhb
# 2f9b9f9c	04-Apr-2016	John Baldwin <jhb@FreeBSD.org>	Remove an unneeded check. CPUs with valid per-CPU data are not absent. Sponsored by: Netflix
# 8ec07310	01-Feb-2016	Gleb Smirnoff <glebius@FreeBSD.org>	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h
# a9467c3c	25-Apr-2015	Hiren Panchasara <hiren@FreeBSD.org>	Currently there is no easy way to specify net.isr.maxthreads = all cpus. We need to specify exact number of cpus in loader.conf which get annoying when you have mix of machines which don't have equal number of total cpus. I propose "-1" as that value. When loader.conf has net.isr.maxthreads = -1, netisr will use all available cpus. In collaboration with: davide Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D2318 MFC after: 2 weeks Sponsored by: Limelight Networks
# 51d4054e	09-Apr-2015	George V. Neville-Neil <gnn@FreeBSD.org>	Revert 281276 as unnecessary. Proper change to be committed to the base polling code in a subsequent commit. Pointed out by: glebius Sponsored by: Rubicon Communications (NetGate)
# 8a7ad101	08-Apr-2015	George V. Neville-Neil <gnn@FreeBSD.org>	Add support for a netisr polling tunable, which allows run time switching of device polling rather than having it only be controlled by the compile time option. Summary: Rubicon Communications (Netgate) Reviewers: #network, hiren Reviewed By: #network, hiren Subscribers: hiren Differential Revision: https://reviews.freebsd.org/D2258
# c2529042	01-Dec-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# da162ca8	26-Nov-2013	Sergey Kandaurov <pluknet@FreeBSD.org>	Fix macro name in comment.
# 933e681d	06-Sep-2013	Davide Italiano <davide@FreeBSD.org>	Retire netisr.netisr_direct and netisr.netisr_direct_force sysctls. These were used to control/export dispatch policy but they're not anymore. This commit cannot be MFC'ed to 9 because old netstat(9) binary relies on such sysctl to work. On the other hand, there's no real reason to keep'em around in 10.
# 6472ac3d	07-Nov-2011	Ed Schouten <ed@FreeBSD.org>	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# d098f930	31-May-2011	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
# f2d2d694	23-May-2011	Robert Watson <rwatson@FreeBSD.org>	Rework netisr policy mechanism so that per-protocol dispatch policies can be represented: - A single policy namespace is defined, consisting of four possible policies: "default" to use the global default, "deferred" to force deferred dispatch, "direct" to employ direct dispatch where possible, and "hybrid" which makes a dynamic decision based on CPU affinity, ordering, etc. Routines are implemented to convert between strings and an integer namespace. - A new global variable, netisr_dispatch_policy, subsumes existing global variables for direct dispatch, forced direct dispatch, etc, and is used for explicit policy interpretation and composition. Old variables remain so that they can be exported by legacy sysctls for use by old netstat(1) binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the above strings for specifying a global policy default. - The protocol registration structure, netisr_handler, grows an nh_dispatch field, which accepts a per-policy policy override. The default value is '0', which corresponds to "default", meaning that protocols will accept the global default policy unless otherwise specified. - Policies are now interpreted and composed explicitly at various points in packet dispatch; protocol policies override global policies. - Protocols grow the ability to express a non-opinion about affinity even when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the framework falls back on source ordering, rather than simply using the current CPU. These changes are in support of allowing link layer re-dispatch based on RSS or similar hashes provided by NICs, especially in the case where the number of hardware receive queues matches hardware core count, rather than hardware thread count, requiring further software redistributeon. (i.e., on RMI XLR). MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.
# 0028e524	11-Feb-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Mfp4 CH=177255: Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS. Change the syntax to match KASSERT() to allow more flexible panic messages rather than having a printf with hardcoded arguments before panic. Adjust the few assertions we have to the new format (and enhance the output). Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks
# f88910cd	12-Jan-2011	Matthew D Fleming <mdf@FreeBSD.org>	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 3aa6d94e	11-Jun-2010	John Baldwin <jhb@FreeBSD.org>	Update several places that iterate over CPUs to use CPU_FOREACH().
# 25d3931a	31-May-2010	Robert Watson <rwatson@FreeBSD.org>	Merge r200899 from head to stable/8: When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. Approved by: re (kib)
# 938448cd	28-Feb-2010	Robert Watson <rwatson@FreeBSD.org>	Changes to support crashdump analysis of netisr: - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks
# 7f450feb	25-Feb-2010	Robert Watson <rwatson@FreeBSD.org>	Fix edge cases in several KASSERTs: use <= rather than < when testing that counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days
# 2d22f334	22-Feb-2010	Robert Watson <rwatson@FreeBSD.org>	Export netisr configuration and statistics to userspace via sysctl(9). MFC after: 1 week Sponsored by: Juniper Networks
# 78494902	15-Feb-2010	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Mark various sysctls also as tunables. Reviewed by: rwatson MFC after: 1 week
# 912f6323	22-Dec-2009	Robert Watson <rwatson@FreeBSD.org>	When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days
# 0a32e29f	22-Dec-2009	Robert Watson <rwatson@FreeBSD.org>	Refine netisr.c comments a bit.
# 530c0060	01-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
# ba3b25b3	29-Jun-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	In case we cannot queue a packet reaching the queue limit, retain the semantics netisr_queue() always had and free the mbuf along with returning the error. Reviewed by: rwatson Approved by: re (kensmith)
# 9e6e01eb	26-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	In light of DPCPU use by netisr, revise various for loops from using MAXCPU to mp_maxid, and handling and reporting of requests to use more threads than we have CPUs to run them on. Reviewed by: bz Approved by: re (kib) MFC after: 6 weeks
# 53402767	25-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizing arrays to [MAXCPU], offering moderate memory savings. In some places, this requires using CPU_ABSENT() to handle less common platforms with sparse CPU IDs. In several places, assert that the selected CPUID for work placement or statistics is not CPU_ABSENT() to be on the safe side. Discussed with: bz, jeff
# ed655c8c	14-Jun-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Add an optional callback function that will be invoked when a per-CPU queue was drained. It will never fire for a directly dispatched packet. You will most likely never want to use this for any ordinary netisr usage and you will never blame netisr in case you try to use it and it does not work as expected. Reviewed by: rwatson
# d363c617	01-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Revert a recent netisr2 change: when billing packets to the current CPU, don't lock the workstream, as its mutexes may not have been initialized if there are fewer workstreams than CPUs. Run into by: hps, ps
# ed54411c	01-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Garbage collect NETISR_POLL and NETISR_POLLMORE, which are no longer required for options DEVICE_POLLING. De-fragment the NETISR_ constant space and lower NETISR_MAXPROT from 32 to 16 -- when sizing queue arrays using this compile-time constant, significant amounts of memory are saved. Warn on the console when tunable values for netisr are automatically adjusted during boot due to exceeding limits, invalid values, or as a result of DEVICE_POLLING.
# d4b5cae4	01-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz
# 2f120c90	13-May-2009	Robert Watson <rwatson@FreeBSD.org>	Garbage collect now-unused NETISR_FORCEQUEUE, which overrode the global direct dispatch policy for specific protocols (NETISR_USB). We leave the additional 'flags' argument to netisr_register() for the time being, even though it is no longer required.
# 21ca7b57	05-May-2009	Marko Zec <zec@FreeBSD.org>	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 59dd72d0	03-Jul-2008	Robert Watson <rwatson@FreeBSD.org>	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.
# 237fdd78	16-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 0bf686c1	06-Aug-2007	Robert Watson <rwatson@FreeBSD.org>	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
# 33d2bb9c	27-Jul-2007	Robert Watson <rwatson@FreeBSD.org>	First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)
# 1f87450e	28-Nov-2006	Robert Watson <rwatson@FreeBSD.org>	Change net.isr.direct from defaulting to 0 to 1 in 7-CURRENT. This enables direct dispatch of the network stack from the device driver ithread, enabling input path parallelism by default when multiple interfaces are present. The strategy for network stack parallelism is something being actively discussed, and this is just one of several possible (and perfectly reasonable) strategies, but has the distinct advantage of reducing the number of context switches and preemptions significantly, resulting in higher efficiency in many cases. In some caes, this may reduce network stack parallelism due to work not being deferred from the ithread to the netisr. Therefore, the strategy may change in the future, but this offers a reasonable first pass and enabling parallelism while maintaining strong ordering. Hopefully this will trigger lots of nice new bugs. This change is not intended for MFC.
# f0796cd2	05-Oct-2005	Gleb Smirnoff <glebius@FreeBSD.org>	- Don't pollute opt_global.h with DEVICE_POLLING and introduce opt_device_polling.h - Include opt_device_polling.h into appropriate files. - Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that can be compiled as loadable modules. Reviewed by: bde
# cea2165b	04-Oct-2005	Robert Watson <rwatson@FreeBSD.org>	Rename net.isr.enable to net.isr.dispatch. No compatibility code is provided, as this will be the production name as of 6.0. MFC after: 3 days Requested by: scottl
# de10fe70	11-Oct-2004	Andre Oppermann <andre@FreeBSD.org>	Correctly unregister a netisr by clearing the ni->ni_queue field to NULL as well. This field is actually used by various netisr functions to determine the availablility of the specified netisr. This uncomplete unregister leads directly to a crash when the KLD unregistering the netisr is unloaded. Submitted by: Sam <sah@softcardsystems.com> MFC after: 3 days
# ccaae37a	02-Sep-2004	Robert Watson <rwatson@FreeBSD.org>	Correct a comment typo: s/Note/Not/. Pointed out by: kensmith
# ace437c3	28-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	Correct typo in printf() warning. Submitted by: Pawel Worach <pawel.worach at telia.com>
# 1d8cd39e	28-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	Change the default disposition of debug.mpsafenet from 0 to 1, which will cause the network stack to operate without the Giant lock by default. This change has the potential to improve performance by increasing parallelism and decreasing latency in network processing. Due to the potential exposure of existing or new bugs, the following compatibility functionality is maintained: - It is still possible to disable Giant-free operation by setting debug.mpsafenet to 0 in loader.conf. - Add "options NET_WITH_GIANT", which will restore the default value of debug.mpsafenet to 0, and is intended for use on systems compiled with known unsafe components, or where a more conservative configuration is desired. - Add a new declaration, NET_NEEDS_GIANT("componentname"), which permits kernel components to declare dependence on Giant over the network stack. If the declaration is made by a preloaded module or a compiled in component, the disposition of debug.mpsafenet will be set to 0 and a warning concerning performance degraded operation printed to the console. If it is declared by a loadable kernel module after boot, a warning is displayed but the disposition cannot be changed. This is implemented by defining a new SYSINIT() value, SI_SUB_SETTINGS, which is intended for the processing of configuration choices after tunables are read in and the console is available to generate errors, but before much else gets going. This compatibility behavior will go away when we've finished the last of the locking work and are confident that operation is correct.
# 3161f583	27-Aug-2004	Andre Oppermann <andre@FreeBSD.org>	Apply error and success logic consistently to the function netisr_queue() and its users. netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported. Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets. PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days
# 08f85b08	18-Jul-2004	Robert Watson <rwatson@FreeBSD.org>	Comment clarifying debug_mpsafenet.
# 7902224c	08-Nov-2003	Sam Leffler <sam@FreeBSD.org>	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation
# d3be1471	05-Nov-2003	Sam Leffler <sam@FreeBSD.org>	o make debug_mpsafenet globally visible o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation
# 5fd04e38	03-Oct-2003	Robert Watson <rwatson@FreeBSD.org>	When direct dispatching an netisr (net.isr.enable=1), if there are already any queued packets for the isr, process those packets before the newly submitted packet, maintaining ordering of all packets being delivered to the netisr. Remove the bypass counter since we don't bypass anymore. Leave the comment about possible problems and options since later performance optimization may change the strategy for addressing ordering problems here. Specifically, this maintains the strong isr ordering guarantee; additional parallelism and lower latency may be possible by moving to weaker guarantees (per-interface, for example). We will probably at some point also want to remove the one instance netisr dispatch limit currently enforced by a mutex, but it's not clear that's 100% safe yet, even in the netperf branch. Reviewed by: sam, others
# e590eca2	01-Oct-2003	Robert Watson <rwatson@FreeBSD.org>	Create a tunable for net.isr.enable so that it may be set from inception, rather than having to wait for the boot to finish.
# 3164565d	01-Oct-2003	Robert Watson <rwatson@FreeBSD.org>	Temporarily turn net.isr.enable back off again until patches to correct potential nits in packet ordering are resolved.
# 19288f73	01-Oct-2003	Robert Watson <rwatson@FreeBSD.org>	Enable net.isr.enable by default, causing "delivery to completion" (direct dispatch) in interrupt threads when the netisr in question isn't already active. If a netisr is already active, or direct dispatch is already in progress, we queue the packet for later delivery. Previously, this option was disabled by default. I have measured 20%+ performance improvements in IP packet forwarding with this enabled. Please report any problems ASAP, especially relating to stack depth or out-of-order packet processing. Discussed with: jlemon, peter Sponsored by: DARPA, Network Associates Laboratories
# fb68148f	08-Mar-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Discard the packet if the netisr queue is null instead of panicing, for the benefit of modules which are compiled differently than the kernel.
# 1cafed39	04-Mar-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs
# e3b6e33c	21-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Moved netisr code from kern/kern_intr.c to net/netisr.c as threatened in a comment.