Cross Reference: /freebsd-current/sys/net/if

History log of /freebsd-current/sys/net/if_stf.c
Revision	Date	Author	Comments
# 71625ec9	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c comment pattern Remove /^/[/]\s\$FreeBSD\$.*\n/
# c373e1d6	22-Apr-2023	Zhenlei Huang <zlei@FreeBSD.org>	if_stf: Delete unreachable code As the flag M_WAITOK is passed to ip_encap_attach(), then the function will never return NULL, and the following code within NULL check branch will be unreachable. No functional change intended. Reviewed by: kp Fixes: 6d8fdfa9d5e7d Rework IP encapsulation handling code MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39746
# 2c2b37ad	13-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	ifnet/API: Move struct ifnet definition to a <net/if_private.h> Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
# 91ebcbe0	21-Sep-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	if_clone: migrate some consumers to the new KPI. Convert most of the cloner customers who require custom params to the new if_clone KPI. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D36636 MFC after: 2 weeks
# 439da7f0	30-Nov-2021	Kristof Provost <kp@FreeBSD.org>	if_stf: KASAN fix In in_stf_input() we grabbed a pointer to the IPv4 header and later did an m_pullup() before we look at the IPv6 header. However, m_pullup() could rearrange the mbuf chain and potentially invalidate the pointer to the IPv4 header. Avoid this issue by copying the IP header rather than getting a pointer to it. Reported by: markj, Jenkins (KASAN job) Reviewed by: markj MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33192
# b46512f7	09-Nov-2021	Kristof Provost <kp@FreeBSD.org>	if_stf: add dtrace probe points Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33038
# 19dc6445	08-Nov-2021	Kristof Provost <kp@FreeBSD.org>	if_stf: add 6rd support Implement IPv6 Rapid Deployment (RFC5969) on top of the existing 6to4 (RFC3056) if_stf code. PR: 253328 Reviewed by: hrs Obtained from: pfSense Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33037
# 8e45fed3	04-Nov-2021	Kristof Provost <kp@FreeBSD.org>	if_stf: enable use in vnet jails The cloner must be per-vnet so that cloned interfaces get destroyed when the vnet goes away. Otherwise we fail assertions in vnet_if_uninit(): panic: vnet_if_uninit:475 tailq &V_ifnet=0xfffffe01665fe070 not empty cpuid = 19 time = 1636107064 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d0cac60 vpanic() at vpanic+0x187/frame 0xfffffe015d0cacc0 panic() at panic+0x43/frame 0xfffffe015d0cad20 vnet_if_uninit() at vnet_if_uninit+0x7b/frame 0xfffffe015d0cad30 vnet_destroy() at vnet_destroy+0x170/frame 0xfffffe015d0cad60 prison_deref() at prison_deref+0x9b0/frame 0xfffffe015d0cadd0 sys_jail_remove() at sys_jail_remove+0x119/frame 0xfffffe015d0cae00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d0caf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d0caf30 --- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x8011e920a, rsp = 0x7fffffffe788, rbp = 0x7fffffffe810 --- KDB: enter: panic MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32849
# 3576121c	05-Nov-2021	Kristof Provost <kp@FreeBSD.org>	if_stf: style(9) pass As stated in style(9): "Values in return statements should be enclosed in parentheses." MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32848
# c8ee75f2	10-Oct-2021	Gleb Smirnoff <glebius@FreeBSD.org>	Use network epoch to protect local IPv4 addresses hash. The modification to the hash are already naturally locked by in_control_sx. Convert the hash lists to CK lists. Remove the in_ifaddr_rmlock. Assert the network epoch where necessary. Most cases when the hash lookup is done the epoch is already entered. Cover a few cases, that need entering the epoch, which mostly is initial configuration of tunnel interfaces and multicast addresses. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D32584
# 2144431c	08-Oct-2021	Gleb Smirnoff <glebius@FreeBSD.org>	Remove in_ifaddr_lock acquisiton to access in_ifaddrhead. An IPv4 address is embedded into an ifaddr which is freed via epoch. And the in_ifaddrhead is already a CK list. Use the network epoch to protect against use after free. Next step would be to CK-ify the in_addr hash and get rid of the... Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D32434
# bb4a7d94	04-Mar-2021	Kristof Provost <kp@FreeBSD.org>	net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header. Use them where appropriate. Reviewed by: ae (previous version), rscheff, tuexen, rgrimes MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29056
# 6ad7446c	02-Jul-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# 4b24e5b1	10-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Interface output method must be executed in network epoch, so if_addr_rlock() isn't needed here.
# b8a6e03f	07-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111
# ca1163bd	30-Mar-2019	Mark Johnston <markj@FreeBSD.org>	Do not perform DAD on stf(4) interfaces. stf(4) interfaces are not multicast-capable so they can't perform DAD. They also did not set IFF_DRV_RUNNING when an address was assigned, so the logic in nd6_timer() would periodically flag such an address as tentative, resulting in interface flapping. Fix the problem by setting IFF_DRV_RUNNING when an address is assigned, and do some related cleanup: - In in6if_do_dad(), remove a redundant check for !UP \|\| !RUNNING. There is only one caller in the tree, and it only looks at whether the return value is non-zero. - Have in6if_do_dad() return false if the interface is not multicast-capable. - Set ND6_IFF_NO_DAD when an address is assigned to an stf(4) interface and the interface goes UP as a result. Note that this is not sufficient to fix the problem because the new address is marked as tentative and DAD is started before in6_ifattach() is called. However, setting no_dad is formally correct. - Change nd6_timer() to not flag addresses as tentative if no_dad is set. This is based on a patch from Viktor Dukhovni. Reported by: Viktor Dukhovni <ietf-dane@dukhovni.org> Reviewed by: ae MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19751
# 232485a1	26-Oct-2018	Eugene Grosbein <eugen@FreeBSD.org>	Prevent stf(4) from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others
# 6d8fdfa9	05-Jun-2018	Andrey V. Elsukov <ae@FreeBSD.org>	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617
# 46d0f824	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	net: fix set but not used
# d7c5a620	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# f227f64a	27-Jul-2017	Luiz Otavio O Souza <loos@FreeBSD.org>	Remove the unused mutex since r273220. MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate)
# 350e6227	08-Mar-2017	Andrey V. Elsukov <ae@FreeBSD.org>	Remove now unneded cast.
# d177868c	29-Jan-2017	Luiz Otavio O Souza <loos@FreeBSD.org>	The stf(4) interface name does not conform with the default naming convention for interfaces, because only one stf(4) interface can exist in the system. This disallow the use of unit numbers different than 0, however, it is possible to create the clone without specify the unit number (wildcard). In the wildcard case we must update the interface name before return. This fix an infinite recursion in pf code that keeps track of network interfaces and groups: 1 - a group for the cloned type of the interface is added (stf in this case); 2 - the system will now try to add an interface named stf (instead of stf0) to stf group; 3 - when pfi_kif_attach() tries to search for an already existing 'stf' interface, the 'stf' group is returned and thus the group is added as an interface of itself; This will now cause a crash at the first attempt to traverse the groups which the stf interface belongs (which loops over itself). Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)
# 0792bcbb	16-Dec-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert if_stf(4) to new routing api.
# 926381e1	31-Jul-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Ansify if_stf.c
# a5965d15	30-Jul-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Build if_stf(4) module only when both INET and INET6 support are enabled.
# cc0a3c8c	29-Jul-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149
# 06cd035a	23-Dec-2014	Andrey V. Elsukov <ae@FreeBSD.org>	Remove if_stf.h. It contains only one function declaration used by if_stf(4). Also make in_stf_protosw structure static.
# 2dfcd0ae	01-Dec-2014	Andrey V. Elsukov <ae@FreeBSD.org>	Remove unneded check. No need to do m_pullup to the size that we prepended. MFC after: 1 week Sponsored by: Yandex LLC
# f9723c77	20-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.
# 36f34ac7	09-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix nd6_output_flush() prototype. Remove 'net/route_internal.h' header from stf.
# 033074c4	09-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.
# 1a75e3b2	06-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC
# 69b74805	04-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert gif and stf to use new routing api.
# 8c3cfe0b	04-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.
# b4e8f808	19-Oct-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Switch IPv4 output path to use new routing api. The goals of the new API is to provide consumers with minimal needed information, but as fast as possible. So we provide full nexthop info copied into alighed on-cache structure instead of rte/ia pointers, their refcounts and locks. This does not provide solution for protecting from egress ifp destruction, but does not make it any worse. Current changes: nhops: Add fib4_lookup_prepend() function which stores either full L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2 info exists and we have to take "slow" path. ip_output: Currently ip[ 46]_output consumers use 'struct route' for the following purposes: 1) double lookup avoidance(route caching) 2) plain route caching 3) get path MTU to be able to notify source. The former pattern is mostly used by various tunnels (gif, gre, stf). (Actually, gre is the only remaining, others were already converted. Their locking model did not scale good enogh to benefit from such caching, so we have (temporarily) removed it without any performance loss). Plain route caching used by SCTP is simply wrong and should be removed. Temporary break it for now just to be able to compile. Optimize path mtu reporting by providing it in new 'route_info' stucture. Minimize games with @ia locking/refcounting for route lookup: add special nhop[46]_extended structure to store more route attributes. Pointer to given structure can be passed to fib4_lookup_prepend() to indicate we want this info (we actually needs it for UDP and raw IP). ether_output: Provide light-weight ether_output2() call to deal with transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/ other L2 hooks before actually transmitting frame by if_transmit()). Add a hack based on new RT_NHOP ro_flag to distinguish which version should we call. Better way is probably to add a new "if_output_frame" driver callbacks. Next steps: * Convert ip_fastfwd part * Implement auto-growing array for per-radix nexthops * Implement LLE tracking for nexthop calculations to be able to immediately provide all necessary info in single route lookup for gateway routes * Switch radix locking scheme to runtime/cfg lock * Implement multipath support for rtsock * Implement "tracked nexthops" for tunnels (e.g. _proper_ nexthop caching) * Add IPv6 support for remaining parts (postponed not to interfere with user/ae/inet6 branch) * Consider adding "if_output_frame" driver call to ease logical frame pushing.
# d74b9a2c	17-Oct-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	* Remove route caching in if_stf. * Copy necessary in6_ifa on stack instead of playing with refcounts.
# 3751dddb	19-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Mechanically convert to if_inc_counter().
# 73d76e77	14-Aug-2014	Kevin Lo <kevlo@FreeBSD.org>	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb
# 8f5a8818	07-Aug-2014	Kevin Lo <kevlo@FreeBSD.org>	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# e3a7aa6f	04-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 76039bc8	26-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 47e8d432	25-Apr-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Add const qualifier to the dst parameter of the ifnet if_output method.
# e37e7917	27-Dec-2012	Andrey V. Elsukov <ae@FreeBSD.org>	Add an ability to set net.link.stf.permit_rfc1918 from the loader. MFC after: 2 weeks
# 51743c5f	27-Dec-2012	Andrey V. Elsukov <ae@FreeBSD.org>	Add net.link.stf.permit_rfc1918 sysctl variable. It can be used to allow the use of private IPv4 addresses with stf(4). MFC after: 2 weeks
# eb1b1807	05-Dec-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
# 8f134647	22-Oct-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
# 42a58907	16-Oct-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Make the "struct if_clone" opaque to users of the cloning API. Users now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago
# 9823d527	10-Oct-2012	Kevin Lo <kevlo@FreeBSD.org>	Revert previous commit... Pointyhat to: kevlo (myself)
# a10cee30	09-Oct-2012	Kevin Lo <kevlo@FreeBSD.org>	Prefer NULL over 0 for pointers
# 2541fcd9	17-Aug-2012	John Baldwin <jhb@FreeBSD.org>	Unexpand a couple of TAILQ_FOREACH()s.
# 99ab4b12	15-Jul-2012	Alexander V. Chernikov <melifaro@FreeBSD.org>	Permit changing MTU in 6to4 relay. This behavior is recommended by RFC 4213 clause 3.2. Sometimes fragmentation is the least evil. For example, some Linux IPVS kernels forwards ICMPv6 checksums to real servers incorrectly. Reviewed by: hrs(previous version) Approved by: kib(mentor) MFC after: 1 week
# 6472ac3d	07-Nov-2011	Ed Schouten <ed@FreeBSD.org>	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# a34c6aeb	03-Jul-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Tag mbufs of all incoming frames or packets with the interface's FIB setting (either default or if supported as set by SIOCSIFFIB, e.g. from ifconfig). Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) Reviewed by: julian MFC after: 2 weeks
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# e50d35e6	03-May-2010	Maxim Sobolev <sobomax@FreeBSD.org>	Add new tunable 'net.link.ifqmaxlen' to set default send interface queue length. The default value for this parameter is 50, which is quite low for many of today's uses and the only way to modify this parameter right now is to edit if_var.h file. Also add read-only sysctl with the same name, so that it's possible to retrieve the current value. MFC after: 1 month
# 530c0060	01-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
# eddfbb76	14-Jul-2009	Robert Watson <rwatson@FreeBSD.org>	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
# 3893212d	25-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Update if_stf and if_tun to use if_addr_rlock()/if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK() when iterating ifp->if_addrhead. MFC after: 6 weeks
# 2d9cfaba	25-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz
# fe0ecfd6	24-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Make stf_getsrcifa6() return a reference to an in6_ifaddr rather than a pointer, and dispose of the references when no longer needed. MFC after: 6 weeks
# bcf11e8d	05-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# 82324826	20-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Prefer ifa_link (structure field) to ifa_list (macro alias for it). MFC after: 2 weeks
# 989c0cb5	20-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the interface address list in if_stf.c. Acquire interface address list locks around address list access. MFC after: 2 months
# 279aa3d4	16-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson
# 4b79449e	02-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 8b615593	02-Oct-2008	Marko Zec <zec@FreeBSD.org>	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
# 0dae32f2	24-Sep-2008	David Malone <dwmalone@FreeBSD.org>	Some people's 6to4 routers seem to have been blowing up because of the unlocked route caching in if_stf. Add a mutex that protects access to cached route. This seemed to fix problems for Pekka Savola. Nick Sayer had similar problems, and in his case completly disabling the route cache seemed to help. Add a sysctl net.link.stf.route_cache that can be used to turn off route caching in if_stf. PR: 122283 MFC after: 2 weeks Tested by: Pekka Savola, Nick Sayer.
# 603724d3	17-Aug-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
# 8b07e49a	09-May-2008	Julian Elischer <julian@FreeBSD.org>	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
# 30d239bc	24-Oct-2007	Robert Watson <rwatson@FreeBSD.org>	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
# bc60490a	23-Sep-2007	Christian S.J. Peron <csjp@FreeBSD.org>	Certain consumers of rtalloc like gif(4) and if_stf(4) lookup the route and once they are done with it, call rtfree(). rtfree() should only be used when we are certain we hold the last reference to the route. This bug results in console messages like the following: rtfree: 0xc40f7000 has 1 refs This patch switches the rtfree() to use RTFREE_LOCKED() instead, which should handle the reference counting on the route better. Approved by: re@ (gnn) Reviewed by: bms Reported by: many via net@ and current@ Tested by: many
# aed55708	22-Oct-2006	Robert Watson <rwatson@FreeBSD.org>	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 43bc7a9c	04-Aug-2006	Brooks Davis <brooks@FreeBSD.org>	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.
# 6b7330e2	09-Jul-2006	Sam Leffler <sam@FreeBSD.org>	Revise network interface cloning to take an optional opaque parameter that can specify configuration parameters: o rev cloner api's to add optional parameter block o add SIOCCREATE2 that accepts parameter data o rev vlan support to use new api (maintain old code) Reviewed by: arch@
# 4b97d7af	29-Jun-2006	Yaroslav Tykhiy <ytykhiy@gmail.com>	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru
# 06dc090f	29-Jun-2006	Yaroslav Tykhiy <ytykhiy@gmail.com>	Use TAILQ_FOREACH.
# 16d878cc	02-Jun-2006	Christian S.J. Peron <csjp@FreeBSD.org>	Fix the following bpf(4) race condition which can result in a panic: (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month
# 303989a2	09-Nov-2005	Ruslan Ermilov <ru@FreeBSD.org>	Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.
# 4e7e0183	08-Nov-2005	Andrew Thompson <thompsa@FreeBSD.org>	Move the cloned interface list management in to if_clone. For some drivers the softc lists and associated mutex are now unused so these have been removed. Calling if_clone_detach() will now destroy all the cloned interfaces for the driver and in most cases is all thats needed to unload. Idea by: brooks Reviewed by: brooks
# febd0759	12-Oct-2005	Andrew Thompson <thompsa@FreeBSD.org>	Change the reference counting to count the number of cloned interfaces for each cloner. This ensures that ifc->ifc_units is not prematurely freed in if_clone_detach() before the clones are destroyed, resulting in memory modified after free. This could be triggered with if_vlan. Assert that all cloners have been destroyed when freeing the memory. Change all simple cloners to destroy their clones with ifc_simple_destroy() on module unload so the reference count is properly updated. This also cleans up the interface destroy routines and allows future optimisation. Discussed with: brooks, pjd, -current Reviewed by: brooks
# 01399f34	26-Jun-2005	David Malone <dwmalone@FreeBSD.org>	Fix some long standing bugs in writing to the BPF device attached to a DLT_NULL interface. In particular: 1) Consistently use type u_int32_t for the header of a DLT_NULL device - it continues to represent the address family as always. 2) In the DLT_NULL case get bpf_movein to store the u_int32_t in a sockaddr rather than in the mbuf, to be consistent with all the DLT types. 3) Consequently fix a bug in bpf_movein/bpfwrite which only permitted packets up to 4 bytes less than the MTU to be written. 4) Fix all DLT_NULL devices to have the code required to allow writing to their bpf devices. 5) Move the code to allow writing to if_lo from if_simloop to looutput, because it only applies to DLT_NULL devices but was being applied to other devices that use if_simloop possibly incorrectly. PR: 82157 Submitted by: Matthew Luckie <mjl@luckie.org.nz> Approved by: re (scottl)
# b03965dd	13-Jun-2005	Brooks Davis <brooks@FreeBSD.org>	Initialze ifp->if_softc. Submitted by: ume
# fc74a9f9	10-Jun-2005	Brooks Davis <brooks@FreeBSD.org>	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam
# 89bc9a31	23-Feb-2005	Sam Leffler <sam@FreeBSD.org>	the rt parameter to ifa_rtrequest callbacks should always be non-null; eliminate grauitous ptr checks that follow ptr deref's Noticed by: Coverity Prevent analysis tool
# 529ed56f	11-Jan-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	don't see NBPFILTER.
# 2d106a00	11-Jan-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	remove HAVE_OLD_BPF part.
# 9b1a7076	11-Jan-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	fix typo.
# c398230b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# 3e019dea	15-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events. A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".
# f889d2ef	22-Jun-2004	Brooks Davis <brooks@FreeBSD.org>	Major overhaul of pseudo-interface cloning. Highlights include: - Split the code out into if_clone.[ch]. - Locked struct if_clone. [1] - Add a per-cloner match function rather then simply matching names of the form <name><unit> and <name>. - Use the match function to allow creation of <interface>.<tag> vlan interfaces. The old way is preserved unchanged! - Also the match function to allow creation of stf(4) interfaces named stf0, stf, or 6to4. This is the only major user visible change in that "ifconfig stf" creates the interface stf rather then stf0 and does not print "stf0" to stdout. - Allow destroy functions to fail so they can refuse to delete interfaces. Currently, we forbid the deletion of interfaces which were created in the init function, particularly lo0, pflog0, and pfsync0. In the case of lo0 this was a panic implementation so it does not count as a user visiable change. :-) - Since most interfaces do not need the new functionality, an family of wrapper functions, ifc_simple_*(), were created to wrap old style cloner functions. - The IF_CLONE_INITIALIZER macro is replaced with a new incompatible IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE instead. Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1] Reviewed by: andre, mlaier Discussed on: net
# 5dba30f1	30-May-2004	Poul-Henning Kamp <phk@FreeBSD.org>	add missing #include <sys/module.h>
# 1861b710	18-Apr-2004	Brooks Davis <brooks@FreeBSD.org>	Use an tempory struct ifnet *ifp instead of sc->sc_if to access the ifnet in stf_clone_create. Also use if_printf() instead of printf().
# bb2bfb4f	13-Apr-2004	Brooks Davis <brooks@FreeBSD.org>	Staticize <if>_clone_{create,destroy} functions. Reviewed by: mlaier
# 15db03a0	09-Mar-2004	Robert Watson <rwatson@FreeBSD.org>	Introduce stf_mtx to protect global softc list in if_stf. Add stf_destroy() to handle the common softc destruction path for the two destruction sources: interface cloning destroy, and module unload. NOTE: sc_ro, the cached route for stf conversion, is not synchronized against concurrent access in this change, that will follow in a future change. Reviewed by: pjd
# 591cf7ce	06-Mar-2004	Robert Watson <rwatson@FreeBSD.org>	Const-poison ip_stf_ttl to make it clear that the variable is not modified at run-time.
# 437ffe18	27-Dec-2003	Sam Leffler <sam@FreeBSD.org>	o eliminate widespread on-stack mbuf use for bpf by introducing a new bpf_mtap2 routine that does the right thing for an mbuf and a variable-length chunk of data that should be prepended. o while we're sweeping the drivers, use u_int32_t uniformly when when prepending the address family (several places were assuming sizeof(int) was 4) o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated mbufs have been eliminated; this may better be moved to the bpf routines Reviewed by: arch@ and several others
# 9bf40ede	31-Oct-2003	Brooks Davis <brooks@FreeBSD.org>	Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance. This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics. Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)
# d1dd20be	03-Oct-2003	Sam Leffler <sam@FreeBSD.org>	Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
# 1cafed39	04-Mar-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 8d95b0ce	15-Jan-2003	SUZUKI Shinsuke <suz@FreeBSD.org>	sync with KAME to simplify rev 1.28's patch (no functional changes) Obtained from: KAME Reviewd by: fenner Approved by: re (jhb)
# 2b0e5976	05-Jan-2003	Bill Fenner <fenner@FreeBSD.org>	Fix alignment problems -- the embedded v4 address is guaranteed to be only 16-bit aligned, so only do byte operations to compare with it.
# 6fc32a24	14-Nov-2002	Sam Leffler <sam@FreeBSD.org>	network interface and link layer changes: o on input don't strip the Ethernet header from packets o input packet handling is now done with if_input o track changes to ether_ifattach/ether_ifdetach API o track changes to bpf tapping o call ether_ioctl for default handling of ioctl's o use constants from net/ethernet.h where possible Reviewed by: many Approved by: re
# 6b459e49	20-Oct-2002	Robert Watson <rwatson@FreeBSD.org>	When packets pass in and out of six-to-four (STF) tunnels, perform labeling checks and operations as with other network interfaces. Eventually, if it proves desirable, we might want to offer special casing of this or other tunnel interfaces where we have an existing label of interest, rather than treating it as though it's an entirely fresh mbuf in the incoming/outgoing encapsulation directions. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 5d846453	15-Oct-2002	Sam Leffler <sam@FreeBSD.org>	Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
# f26b2d5b	17-Sep-2002	Hajimu UMEMOTO <ume@FreeBSD.org>	- increment interface output counter. sync w/ netbsd-current - increase if_oerrors. sync w/netbsd Obtained from: KAME
# ce9d7b2f	17-Sep-2002	Hajimu UMEMOTO <ume@FreeBSD.org>	- reject SIOCSIFADDR if embedded address is in private address range - reject packets from private address range. from hitachi Obtained from: KAME
# ae5a19be	25-May-2002	Brooks Davis <brooks@FreeBSD.org>	Move all unit number management cloned interfaces into the cloning code. The reverts the API change which made the <if>_clone_destory() functions return an int instead of void bringing us into closer alignment with NetBSD. Reviewed by: net (a long time ago)
# 88ff5695	18-Apr-2002	SUZUKI Shinsuke <suz@FreeBSD.org>	just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD. (based on freebsd4-snap-20020128) Reviewed by: ume MFC after: 1 week
# 929ddbbb	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# 3b16e7b2	11-Mar-2002	Maxime Henrion <mux@FreeBSD.org>	Simplify the interface cloning framework by handling unit unit allocation with a bitmap in the generic layer. This allows us to get rid of the duplicated rman code in every clonable interface. Reviewed by: brooks Approved by: phk
# b75496fe	04-Mar-2002	Brooks Davis <brooks@FreeBSD.org>	Change the network interface cloning API so the destroy function returns an int errorcode instead of void in preperation for merging cloning of the loopback device. Submitted by: mux MFC after: 2 weeks
# 777b9faa	27-Feb-2002	Peter Wemm <peter@FreeBSD.org>	Fix warnings.
# e8783c4d	08-Jan-2002	Mike Smith <msmith@FreeBSD.org>	Staticise private interface lists.
# 1ed4b9fe	06-Dec-2001	Andrew R. Reiter <arr@FreeBSD.org>	- malloc should be passed M_WAITOK, not M_WAIT (a mbuf flag) - make use of M_ZERO to remove a call to bzero()
# 8071913d	17-Oct-2001	Ruslan Ermilov <ru@FreeBSD.org>	Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2. Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo '' as the argument. Pass rt_addrinfo all the way down to rtrequest1 and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now ``rt_addrinfo '' instead of ``sockaddr '' (almost noone is using it anyways). Benefit: the following command now works. Previously we needed two route(8) invocations, "add" then "change". # route add -inet6 default ::1 -ifp gif0 Remove unsafe typecast in rtrequest(), from ``rtentry '' to ``sockaddr *''. It was introduced by 4.3BSD-Reno and never corrected. Obtained from: BSD/OS, NetBSD MFC after: 1 month PR: kern/28360
# cf912c89	28-Sep-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Use in_ifaddrhashtbl instead of in_ifaddrhead to look up IP address.
# abb64706	18-Sep-2001	Brooks Davis <brooks@FreeBSD.org>	Make stf a clonable device. Yes this really is rather silly and the implementation is overkill given that you are only allowed one of them, but NetBSD implements cloning on this device and it's a less cluttered example of cloning then most.
# ff265614	07-Sep-2001	Julian Elischer <julian@FreeBSD.org>	Patches from KAME to remove usage of Varargs in existing IPV4 code. For now they will still have some in the developing stuff (IPv6) Submitted by: Keiichi SHIMA / <keiichi@iij.ad.jp> Obtained from: KAME
# f0ffb944	03-Sep-2001	Julian Elischer <julian@FreeBSD.org>	Patches from Keiichi SHIMA <keiichi@iij.ad.jp> to make ip use the standard protosw structure again. Obtained from: Well, KAME I guess.
# 53dab5fe	02-Jul-2001	Brooks Davis <brooks@FreeBSD.org>	gif(4) and stf(4) modernization: - Remove gif dependencies from stf. - Make gif and stf into modules - Make gif cloneable. PR: kern/27983 Reviewed by: ru, ume Obtained from: NetBSD MFC after: 1 week
# 8acb2290	24-Jun-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	inject outbound packet to BPF. Submitted by: itojun Obtained from: KAME MFC after: 10 days
# 33841545	10-Jun-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
# 22f29826	03-Feb-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Use <sys/queue.h> macro api rather than fondle its implementation detals. Created with: /usr/bin/sed Reviewed by: /sbin/md5
# 2a0c503e	21-Dec-2000	Bosko Milekic <bmilekic@FreeBSD.org>	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
# df5e1987	25-Nov-2000	Jonathan Lemon <jlemon@FreeBSD.org>	Lock down the network interface queues. The queue mutex must be obtained before adding/removing packets from the queue. Also, the if_obytes and if_omcasts fields should only be manipulated under protection of the mutex. IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on the queue. An IF_LOCK macro is provided, as well as the old (mutex-less) versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which needs them, but their use is discouraged. Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF, which takes care of locking/enqueue, and also statistics updating/start if necessary.
# 46aa3347	27-Oct-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should never include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
# d1d1144b	15-Aug-2000	Jun-ichiro itojun Hagino <itojun@FreeBSD.org>	repair endianness issue in IN_MULTICAST(). again, *BSD difference... From: Nick Sayer <nsayer@quack.kfu.com>
# 686cdd19	04-Jul-2000	Jun-ichiro itojun Hagino <itojun@FreeBSD.org>	sync with kame tree as of july00. tons of bug fixes/improvements. API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)