Cross Reference: /freebsd-current/sys/netinet6/ip6

History log of /freebsd-current/sys/netinet6/ip6_output.c
Revision	Date	Author	Comments
# 530c2c30	20-Mar-2024	Andrew Gallatin <gallatin@FreeBSD.org>	ip6_output: Reduce cache misses on pktopts When profiling an IP6 heavy workload, I noticed that we were getting a lot of cache misses in ip6_output() around ip6_pktopts. This was happening because the TCP stack passes inp->in6p_outputopts even if all options are unused. So in the common case of no options present, pkt_opts is not null, and is checked repeatedly for different options. Since ip6_pktopts is large (4 cachelines), and every field is checked, we take 4 cache misses (2 of which tend to be hidden by the adjacent line prefetcher). To fix this common case, I introduced a new flag in ip6_pktopts (ip6po_valid) which tracks which options have been set. In the common case where nothing is set, this causes just a single cache miss to load. It also eliminates a test for some options (if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 ) To keep the struct the same size in 64-bit kernels, and to keep the integer values (like ip6po_hlim, ip6po_tclass, etc) on the same cacheline, I moved them to the top. As suggested by zlei, the null check in MAKE_EXTHDR() becomes redundant, and can be removed. For our web server workload (with the ip6po_tclass option set), this drops the CPI from 2.9 to 2.4 for ip6_output Differential Revision: https://reviews.freebsd.org/D44204 Reviewed by: bz, glebius, zlei No Objection from: melifaro Sponsored by: Netflix Inc.
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# e3ba0d6a	26-Jul-2023	Gleb Smirnoff <glebius@FreeBSD.org>	inpcb: do not copy so_options into inp_flags2 Since f71cb9f74808 socket stays connnected with inpcb through latter's lifetime and there is no reason to complicate things and copy these flags. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D41198
# bc310a95	20-Jul-2023	Konstantin Belousov <kib@FreeBSD.org>	ip output: ensure that mbufs are mapped if ipsec is enabled Ipsec needs access to packet headers to determine if a policy is applicable. It seems that typically IP headers are mapped, but the code is arguably needs to check this before blindly accessing them. Then, operations like m_unshare() and m_makespace() are not yet ready for unmapped mbufs. Ensure that the packet is mapped before calling into IPSEC_OUTPUT(). PR: 272616 Reviewed by: jhb, markj Sponsored by: NVidia networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41112
# 317fa516	28-Feb-2023	Mark Johnston <markj@FreeBSD.org>	netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option It has no effect, and an exp-run revealed that it is not in use. PR: 261398 (exp-run) Reviewed by: mjg, glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38822
# 3aff4ccd	27-Feb-2023	Mark Johnston <markj@FreeBSD.org>	netinet: Remove IP(V6)_BINDMULTI This option was added in commit 0a100a6f1ee5 but was never completed. In particular, there is no logic to map flowids to different listening sockets, so it accomplishes basically the same thing as SO_REUSEPORT. Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to balance among listening sockets using a hash of the 4-tuple and some optional NUMA policy. The option was never documented or completed, and an exp-run revealed nothing using it in the ports tree. Moreover, it complicates the already very complicated in_pcbbind_setup(), and the checking in in_pcbbind_check_bindmulti() is insufficient. So, let's remove it. PR: 261398 (exp-run) Reviewed by: glebius Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38574
# 3d0d5b21	23-Jan-2023	Justin Hibbits <jhibbits@FreeBSD.org>	IfAPI: Explicitly include <net/if_private.h> in netstack Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop including the header in the future. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38200
# 21cc0918	16-Aug-2021	Elliott Mitchell <ehem+freebsd@m5p.com>	sys: Nuke double-semicolons A distinct number of double-semicolons have ended up in FreeBSD. Take a pass at getting rid of many of these harmless typos. Reviewed by: emaste, rrs Pull Request: https://github.com/freebsd/freebsd-src/pull/609 Differential Revision: https://reviews.freebsd.org/D31716
# 2e0e2739	13-Oct-2022	Gleb Smirnoff <glebius@FreeBSD.org>	netinet6: trim overly long lines in GET_PKTOPT_VAR(), fit into 80 chars
# 53af6903	06-Oct-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: remove INP_TIMEWAIT flag Mechanically cleanup INP_TIMEWAIT from the kernel sources. After 0d7445193ab, this commit shall not cause any functional changes. Note: this flag was very often checked together with INP_DROPPED. If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we will be able to remove most of this checks and turn them to assertions. Some of them can be turned into assertions right now, but that should be carefully done on a case by case basis. Differential revision: https://reviews.freebsd.org/D36400
# 46ddeb6b	03-Oct-2022	Gleb Smirnoff <glebius@FreeBSD.org>	netinet6: retire ip6protosw.h The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with the former removed in f0ffb944d25 in 2001 and the latter survived until today. It has been reduced down to only one useful declaration that moves to ip6_var.h Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36726
# dda6376b	08-Sep-2022	Mateusz Guzik <mjg@FreeBSD.org>	net: employ newly added pfil_mbuf_{in,out} where approriate Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36454
# 74ed2e8a	02-Sep-2022	Gleb Smirnoff <glebius@FreeBSD.org>	raw ip: fix regression with multicast and RSVP With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p. Reviewed by: karels Reported by: karels Differential revision: https://reviews.freebsd.org/D36429 Fixes: 61f7427f02a307d28af674a12c45dd546e3898e4
# 50fa27e7	09-Jul-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	netinet6: fix interface handling for loopback traffic Currently, processing of IPv6 local traffic is partially broken: link-local connection fails and global unicast connect() takes 3 seconds to complete. This happens due to the combination of multiple factors. IPv6 code passes original interface "origifp" when passing traffic via loopack to retain the scope that is mandatory for the correct hadling of link-local traffic. First problem is that the logic of passing source interface is not working correcly for TCP connections, resulting in passing "origifp" on the first 2 connection attempts and lo0 on the subsequent ones. Second problem is that source address validation logic skips its checks iff the source interface is loopback, which doesn't cover "origifp" case. More detailed description is available at https://reviews.freebsd.org/D35732 Fix the first problem by untangling&simplifying ifp/origifp logic. Fix the second problem by switching source address validation check to using M_LOOP mbuf flag instead of interface type. PR: 265089 Reviewed by: ae, bz(previous version) Differential Revision: https://reviews.freebsd.org/D35732 MFC after: 2 weeks
# 7d98cc09	01-Apr-2022	Andrey V. Elsukov <ae@FreeBSD.org>	Fix ipfw fwd that doesn't work in some cases For IPv4 use dst pointer as destination address in fib4_lookup(). It keeps destination address from IPv4 header and can be changed when PACKET_TAG_IPFORWARD tag was set by packet filter. For IPv6 override destination address with address from dst_sa.sin6_addr, that was set from PACKET_TAG_IPFORWARD tag. Reviewed by: eugen MFC after: 1 week PR: 256828, 261697, 255705 Differential Revision: https://reviews.freebsd.org/D34732
# 9ba11796	27-Jan-2022	Andrew Gallatin <gallatin@FreeBSD.org>	Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues When ip_output_send() returns EAGAIN due to issues with send tags (route change, lagg failover, etc), it must free the mbuf. This is because ip_output_send() was written as a wrapper/replacement for a direct call to if_output(), and the contract with if_output() has historically been that it owns the mbufs once called. When ip_output_send() failed to free mbufs, it violated this assumption and lead to leaked mbufs. This was noticed when using NIC TLS in combination with hardware rate-limited connections. When seeing lots of NIC output drops triggered ratelimit send tag changes, we noticed we were leaking ktls_sessions, send tags and mbufs. This was due ip_output_send() leaking mbufs which held references to ktls_sessions, which in turn held references to send tags. Many thanks to jbh, rrs, hselasky and markj for their help in debugging this. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34054 Reviewed by: hselasky, jhb, rrs MFC after: 2 weeks
# 9f5432d5	15-Dec-2021	Kristof Provost <kp@FreeBSD.org>	netinet6: ip6_setpktopt() requires NET_EPOCH ip6_setpktopt() can call ifnet_byindex() which requires epoch. Mark the function as requiring NET_EPOCH, and ensure we enter it priot to calling it. Reported-by: syzbot+92526116441688fea8a3@syzkaller.appspotmail.com Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33462
# d74b7bae	04-Dec-2021	Gleb Smirnoff <glebius@FreeBSD.org>	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263
# 44775b16	24-Nov-2021	Mark Johnston <markj@FreeBSD.org>	netinet: Remove unneeded mb_unmapped_to_ext() calls in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted. Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33097
# a8d54fc9	09-Aug-2021	Michael Tuexen <tuexen@FreeBSD.org>	ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options Fix getsockopt() for the IPPROTO_IPV6 level socket options with the following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS, IPV6_DSTOPTS, and IPV6_NEXTHOP. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31458
# 2290dfb4	19-May-2021	Ryan Stone <rstone@FreeBSD.org>	Enter the net epoch before calling ip6_setpktopts ip6_setpktopts() can look up ifnets via ifnet_by_index(), which is only safe in the net epoch. Ensure that callers are in the net epoch before calling this function. Sponsored by: Dell EMC Isilon MFC after: 4 weeks Reviewed by: donner, kp Differential Revision: https://reviews.freebsd.org/D30630
# bb4a7d94	04-Mar-2021	Kristof Provost <kp@FreeBSD.org>	net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header. Use them where appropriate. Reviewed by: ae (previous version), rscheff, tuexen, rgrimes MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29056
# 1bd44b11	14-Feb-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Do not reference returned ifa in in6_ifawithifp(). The only place where in6_ifawithifp() is used is ip6_output(), which uses the returned ifa to bump traffic counters. Given ifa stability guarantees is provided by epoch, do not refcount ifa. This eliminates 2 atomic ops from IPv6 fast path. Reviewed By: rstone MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28649
# 3f43ada9	28-Jan-2021	Gleb Smirnoff <glebius@FreeBSD.org>	Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG. Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.
# 0c325f53	18-Oct-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523
# 868aabb4	08-Oct-2020	Richard Scheffenegger <rscheff@FreeBSD.org>	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409
# b092fd6c	17-Sep-2020	Navdeep Parhar <np@FreeBSD.org>	if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS. This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873
# 662c1305	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	net: clean up empty lines in .c and .h files
# 19afc65a	30-Jul-2020	Mark Johnston <markj@FreeBSD.org>	ip6_output(): Check the return value of in6_getlinkifnet(). If the destination address has an embedded scope ID, make sure that it corresponds to a valid ifnet before proceeding. Otherwise a sendto() with a bogus link-local address can trigger a NULL pointer dereference. Reported by: syzkaller Reviewed by: ae Fixes: r358572 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25887
# 95033af9	18-Jun-2020	Mark Johnston <markj@FreeBSD.org>	Add the SCTP_SUPPORT kernel option. This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# 1483c1c5	28-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Replace ip6_ouput fib6_lookup_nh_<ext\|basic> calls with fib6_lookup(). fib6_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24973
# d7452d89	12-May-2020	Andrew Gallatin <gallatin@FreeBSD.org>	IPv6: sync IP_NO_SND_TAG_RL support from IPv4 The IP_NO_SND_TAG_RL flag to ip{,6}_output() means that the packets being sent should bypass hardware rate limiting. This is typically used by modern TCP stacks for rexmits. This support was added to IPv4 in r352657, but never added to IPv6, even though rack and bbr call ip6_output() with this flag. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24822
# 84af4cc1	11-May-2020	Andrew Gallatin <gallatin@FreeBSD.org>	Fix the build Back out the IPv6 portion of r360903, as the stamp_tag param is apparently not supported in upstream FreeBSD. Sponsored by: Netflix Pointy hat to: gallatin
# 6043ac20	11-May-2020	Andrew Gallatin <gallatin@FreeBSD.org>	Ktls: never skip stamping tags for NIC TLS The newer RACK and BBR TCP stacks have added a mechanism to disable hardware packet pacing for TCP retransmits. This mechanism works by skipping the send-tag stamp on rate-limited connections when the TCP stack calls ip_output() with the IP_NO_SND_TAG_RL flag set. When doing NIC TLS, we must ignore this flag, as NIC TLS packets must always be stamped. Failure to stamp a NIC TLS packet will result in crypto issues. Reviewed by: hselasky, rrs Sponsored by: Netflix, Mellanox
# 7b6c99d0	02-May-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed: s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
# 983066f0	25-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340
# 23feb563	14-Apr-2020	Andrew Gallatin <gallatin@FreeBSD.org>	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213
# e02582d1	19-Mar-2020	Mark Johnston <markj@FreeBSD.org>	Fix synchronization in the IPV6_2292PKTOPTIONS set handler. The inpcb needs to be locked when we update output packet options. Otherwise it is possible for the IPV6_2292PKTOPTIONS handler to free packet option structures while another thread is reading or updating them. Note that the option handler is still kind of broken. For instance it frees all options before performing privilege checks for individual options. However, this can be fixed separately. Reported by: syzbot+52eb0fd4ddc119787f9d@syzkaller.appspotmail.com Reviewed by: bz, tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24125
# 8483fce6	03-Mar-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	ip6: retire in6_selectroute_fib() as promised 8 years ago In r231852 I added in6_selectroute_fib() as a compat function with the fibnum as an extra argument compared to in6_selectroute() to keep the KPI stable. Way too late retire this function again and add the fib to in6_selectroute() which also only has a single consumer now and was an orphan function before.
# 000c42fa	03-Mar-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	ip6_output: use new routing KPI when not passed a cached route Implement the equivalent of r347375 (IPv4) for the IPv6 output path. In IPv6 we get passed a cached route (and inp) by udp6_output() depending on whether we acquired a write lock on the INP. In case we neither bind nor connect a first UDP packet would come in with a cached route (wlocked) and all further packets would not. In case we bind and do not connect we never write-lock the inp. When we do not pass in a cached route, rather than providing the storage for a route locally and pass it over the old lookup code and down the stack, use the new route lookup KPI and acquire all details we need to send the packet. Compared to the IPv4 code the IPv6 code has a couple of possible complications: given an option with a routing hdr/caching route there, and path mtu (ro_pmtu) case which now equally has to deal with the possibility of having a route which is NULL passed in, and the fwd_tag in case a firewall changes the next hop (something to factor out in the future). Sponsored by: Netflix Reviewed by: glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D23886
# 3db60531	25-Feb-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	ip6_output: fix regression introduced in r358167 for ipv6 fragmentation When moving the calculations for the optlen into the if (opt) block which deals with possible extension headers I failed to initialise unfragpartlen to the ipv6 header length if there were no extension headers present. Correct that mistake to make IPv6 fragment length calculcations work again. Reported by: hselasky, kp OKed by: hselasky, kp MFC after: 3 days X-MFC with: r358167 PR: 244393
# 3459050c	24-Feb-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	Fix IPv6 checksums when exthdrs are present. In two places in ip6_output we are doing (delayed) checksum calculations. The initial logic came from SCTP in r205075,205104 and later I copied and adjusted it for the TCP\|UDP case in r235958. The problem was that the original SCTP offsets were already wrong for any case with extension headers present given IPv6 extension headers are not part of the pseudo checksum calculations. The later changes do not help in case there is checksum offloading as for extension headers (incl. fragments) we do currrently never offload as we have no infrastructure to know whether the NIC can handle these cases. Correct the offsets for delayed checksum calculations and properly handle mbuf flags. In addition harmonize the almost identical duplicate code. While here eliminate the now unneeded variable hlen and add an always missing mtod() call in the 1-b and 3 cases after the introduction of the mb_unmapped_to_ext() calls. Reported by: Francis Dupont (fdupont isc.org) PR: 243675 MFC after: 6 days Reviewed by: markj (earlier version), gallatin Differential Revision: https://reviews.freebsd.org/D23760
# a1a6c01e	20-Feb-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	ip6_output: improve extension header handling Move IPv6 source address checks from after extension header heandling to the top of the function. If we do not pass these checks there is no reason to do a lot of work upfront. Fold extension header preparations and length calculations together into a single branch and macro rather than doing them sequentially. Likewise move extension header concatination into a single branch block only doing it if we recorded any extension header length length. Reviewed by: melifaro (earlier version), markj, gallatin Sponsored by: Netflix (partially, originally) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23740
# 7c1daefe	18-Feb-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	ip6_output: update comments. Clear up some comments and improve to panic messages. No functional changes. MFC after: 3 days
# b9555453	21-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Make ip6_output() and ip_output() require network epoch. All callers that before may called into these functions without network epoch now must enter it.
# e00ee1a9	01-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	In r343631 error code for a packet blocked by a firewall was changed from EACCES to EPERM. This change was not intentional, so fix that. Return EACCESS if a firewall forbids sending. Noticed by: ae
# 1e4f4e56	09-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	ip6_output() has a complex set of gotos, and some can jump out of the epoch section towards return statement. Since entering epoch is cheap, it is easier to cover the whole function with epoch, rather than try to properly maintain its state.
# cb49ec54	07-Oct-2019	Mark Johnston <markj@FreeBSD.org>	Improve locking in the IPV6_V6ONLY socket option handler. Acquire the inp lock before checking whether the socket is already bound, and around updates to the inp_vflag field. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21867
# b8a6e03f	07-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111
# b2e60773	26-Aug-2019	John Baldwin <jhb@FreeBSD.org>	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277
# 0ecd976e	02-Aug-2019	Bjoern A. Zeeb <bz@FreeBSD.org>	IPv6 cleanup: kernel Finish what was started a few years ago and harmonize IPv6 and IPv4 kernel names. We are down to very few places now that it is feasible to do the change for everything remaining with causing too much disturbance. Remove "aliases" for IPv6 names which confusingly could indicate that we are talking about a different data structure or field or have two fields, one for each address family. Try to follow common conventions used in FreeBSD. * Rename sin6p to sin6 as that is how it is spelt in most places. * Remove "aliases" (#defines) for: - in6pcb which really is an inpcb and nothing separate - sotoin6pcb which is sotoinpcb (as per above) - in6p_sp which is inp_sp - in6p_flowinfo which is inp_flow * Try to use ia6 for in6_addr rather than in6p. * With all these gone also rename the in6p variables to inp as that is what we call it in most of the network stack including parts of netinet6. The reasons behind this cleanup are that we try to further unify netinet and netinet6 code where possible and that people will less ignore one or the other protocol family when doing code changes as they may not have spotted places due to different names for the same thing. No functional changes. Discussed with: tuexen (SCTP changes) MFC after: 3 months Sponsored by: Netflix
# 82334850	28-Jun-2019	John Baldwin <jhb@FreeBSD.org>	Add an external mbuf buffer type that holds multiple unmapped pages. Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement for Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web serving workloads when used by sendfile, due to effectively compressing socket buffers by an order of magnitude, and hence reducing cache misses. For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer now points to a struct mbuf_ext_pgs structure instead of a data buffer. This structure contains an array of physical addresses (this reduces cache misses compared to an earlier version that stored an array of vm_page_t pointers). It also stores additional fields needed for in-kernel TLS such as the TLS header and trailer data that are currently unused. To more easily detect these mbufs, the M_NOMAP flag is set in m_flags in addition to M_EXT. Various functions like m_copydata() have been updated to safely access packet contents (using uiomove_fromphys()), to make things like BPF safe. NIC drivers advertise support for unmapped mbufs on transmit via a new IFCAP_NOMAP capability. This capability can be toggled via the new 'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only transmit packet contents via DMA and use bus_dma, adding the capability to if_capabilities and if_capenable should be all that is required. If a NIC does not support unmapped mbufs, they are converted to a chain of mapped mbufs (using sf_bufs to provide the mapping) in ip_output or ip6_output. If an unmapped mbuf requires software checksums, it is also converted to a chain of mapped mbufs before computing the checksum. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: ae, kp (firewalls) Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616
# 77a01441	11-Jun-2019	John Baldwin <jhb@FreeBSD.org>	Sort opt_foo.h #includes and add a missing blank line in ip_output().
# fb3bc596	24-May-2019	John Baldwin <jhb@FreeBSD.org>	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117
# c9d33708	10-May-2019	John Baldwin <jhb@FreeBSD.org>	Apply r280991 to ip6_fragment. This uses m_dup_pkthdr() to copy all of the metadata about a packet to each of its fragments including VLAN tags, mbuf tags, etc. instead of hand-copying a few fields. Reviewed by: bz MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117
# 50575ce1	25-Apr-2019	Andrew Gallatin <gallatin@FreeBSD.org>	Track TCP connection's NUMA domain in the inpcb Drivers can now pass up numa domain information via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with the inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid. Follow on changes will use this information for lacp egress port selection, binding TCP pacers to the appropriate NUMA domain, etc. Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20028
# 2f041b74	19-Apr-2019	Michael Tuexen <tuexen@FreeBSD.org>	Improve input validation for the socket option IPV6_CHECKSUM. When using the IPPROTO_IPV6 level socket option IPV6_CHECKSUM on a raw IPv6 socket, ensure that the value is either -1 or a non-negative even number. Reviewed by: bz@, thj@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19966
# b252313f	31-Jan-2019	Gleb Smirnoff <glebius@FreeBSD.org>	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951
# ef0111fd	09-Jan-2019	Hans Petter Selasky <hselasky@FreeBSD.org>	Fix loopback traffic when using non-lo0 link local IPv6 addresses. The loopback interface can only receive packets with a single scope ID, namely the scope ID of the loopback interface itself. To mitigate this packets which use the scope ID are appearing as received by the real network interface, see "origifp" in the patch. The current code would drop packets which are designated for loopback which use a link-local scope ID in the destination address or source address, because they won't match the lo0's scope ID. To fix this restore the network interface pointer from the scope ID in the destination address for the problematic cases. See comments added in patch for a more detailed description. This issue was introduced with route caching (ae@). Reviewed by: bz (network) Differential Revision: https://reviews.freebsd.org/D18769 MFC after: 1 week Sponsored by: Mellanox Technologies
# cc426dd3	11-Dec-2018	Mateusz Guzik <mjg@FreeBSD.org>	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation
# ec86402e	03-Sep-2018	Bjoern A. Zeeb <bz@FreeBSD.org>	Replicate r328271 from legacy IP to IPv6 using a single macro to clear L2 and L3 route caches. Also mark one function argument as __unused. Reviewed by: karels, ae Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17007
# 56713d16	14-Jul-2018	Matt Macy <mmacy@FreeBSD.org>	acquire inp lock around ip6_pcbopt to fix IPV6_TCLASS panic Simple fix to address panics relating to setting IPV6_TCLASS with setsockopt(). The premise of this change is that it is ok to call malloc with M_NOWAIT while holding a lock on the in6p. If it later turns out that it is not ok, then major surgery will be required, as ip6_setpktopt() will have to be fixed (as it also calls malloc with M_NOWAIT) which pulls in the ip6_pcbopts(), ip6_setpktopts(), ip6_setpktopt() call chain. Submitted by: Jason Eggnet Reviewed by: rrs, transport, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16201
# 1a43cff9	06-Jun-2018	Sean Bruno <sbruno@FreeBSD.org>	Load balance sockets with new SO_REUSEPORT_LB option. This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures: Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations: As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). This is a substantially different contribution as compared to its original incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@ for the substantive feedback that is included in this commit. Submitted by: Johannes Lundberg <johalun0@gmail.com> Obtained from: DragonflyBSD Relnotes: Yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003
# 4a089e6b	06-Jun-2018	Andrey V. Elsukov <ae@FreeBSD.org>	Use m_copyback() function to write delayed checksum when it isn't located in the first mbuf of the chain. MFC after: 1 week
# 7875017c	24-Apr-2018	Sean Bruno <sbruno@FreeBSD.org>	Revert r332894 at the request of the submitter. Submitted by: Johannes Lundberg <johalun0_gmail.com> Sponsored by: Limelight Networks
# 7b7796ee	23-Apr-2018	Sean Bruno <sbruno@FreeBSD.org>	Load balance sockets with new SO_REUSEPORT_LB option This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). Submitted by: Johannes Lundberg <johanlun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003
# c187c034	23-Mar-2018	Jonathan T. Looney <jtl@FreeBSD.org>	Remove some unneccessary variable sets in IPv6 code, as detected by clang's static analyzer. Reviewed by: bz MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10940
# 72bfa0bf	23-Mar-2018	Sean Bruno <sbruno@FreeBSD.org>	Revert r331379 as the "simple" lock changes have revealed a deeper problem and need for a rethink. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks
# effaab88	23-Mar-2018	Kristof Provost <kp@FreeBSD.org>	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715
# 06b479a6	22-Mar-2018	Sean Bruno <sbruno@FreeBSD.org>	Refactor ip6_getpcbopt() for better locking and memory management Created GET_PKTOPT_EXT_HDR() and GET_PKTOPT_SOCKADDR() macros to handle safely fetching options from in6p_outputopts, including properly dealing with in6p locking and preparing memory for sooptcopyout(). Changed the function signature of ip6_getpcbopt() to allow the function to acquire and release locks on in6p as needed. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14619
# 2a499acf	22-Mar-2018	Sean Bruno <sbruno@FreeBSD.org>	Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14624
# 5cbeca44	22-Mar-2018	Sean Bruno <sbruno@FreeBSD.org>	Handle locking and memory safety for IPV6_PATHMTU in ip6_ctloutput(). Submitted by: Jason Eggleston <jason@eggnet.com> Reviewed by: ae Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14622
# 37d4fc1e	22-Mar-2018	Sean Bruno <sbruno@FreeBSD.org>	Improve write locking in ip6_ctloutput() with macros. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14620
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# ae69ad88	27-Jul-2017	Bjoern A. Zeeb <bz@FreeBSD.org>	After inpcb route caching was put back in place there is no need for flowtable anymore (as flowtable was never considered to be useful in the forwarding path). Reviewed by: np Differential Revision: https://reviews.freebsd.org/D11448
# 8b07e00e	30-May-2017	Jonathan T. Looney <jtl@FreeBSD.org>	Fix an unnecessary/incorrect check in the PKTOPT_EXTHDRCPY macro. This macro allocates memory and, if malloc does not return NULL, copies data into the new memory. However, it doesn't just check whether malloc returns NULL. It also checks whether we called malloc with M_NOWAIT. That is not necessary. While it may be that malloc() will only return NULL when the M_NOWAIT flag is set, we don't need to check for this when checking malloc's return value. Further, in this case, the check was not completely accurate, because it checked for flags == M_NOWAIT, rather than treating it as a bit field and checking for (flags & M_NOWAIT). Reviewed by: ae MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D10942
# ce9ac139	09-May-2017	Navdeep Parhar <np@FreeBSD.org>	ip6_output runs with the inp lock held, just like ip_output.
# d78c0804	22-Apr-2017	Kristof Provost <kp@FreeBSD.org>	Rename variable for clarity Rename the mtu variable in ip6_fragment(), because mtu is misleading. The variable actually holds the fragment length. No functional change. Suggested by: ae
# 00eab743	20-Apr-2017	Kristof Provost <kp@FreeBSD.org>	pf: Fix possible incorrect IPv6 fragmentation When forwarding pf tracks the size of the largest fragment in a fragmented packet, and refragments based on this size. It failed to ensure that this size was a multiple of 8 (as is required for all but the last fragment), so it could end up generating incorrect fragments. For example, if we received an 8 byte and 12 byte fragment pf would emit a first fragment with 12 bytes of payload and the final fragment would claim to be at offset 8 (not 12). We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so other users won't make the same mistake. Reported by: Antonios Atlasis <aatlasis at secfu net> MFC after: 3 days
# 8c1960d5	25-Mar-2017	Mike Karels <karels@FreeBSD.org>	Fix reference count leak with L2 caching. ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache entry because they used their own routes on the stack, not in_pcb routes. The original model for route caching was callers that provided a route structure to ip{,6}input() would keep the route, and this model was used for L2 caching as well. Instead, change L2 caching to be done by default only when using a route structure in the in_pcb; the pcb deallocation code frees L2 as well as L3 cacches. A separate change will add route caching to TCP/IPv6. Another suggestion was to have the transport protocols indicate willingness to use L2 caching, but this approach keeps the changes in the network level Reviewed by: ae gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10059
# dce33a45	05-Mar-2017	Ermal Luçi <eri@FreeBSD.org>	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Reviewed by: adrian, aw Approved by: ae (mentor) Sponsored by: rsync.net Differential Revision: D9235
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# c10c5b1e	11-Feb-2017	Ermal Luçi <eri@FreeBSD.org>	Committed without approval from mentor. Reported by: gnn
# ed55edce	09-Feb-2017	Ermal Luçi <eri@FreeBSD.org>	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Sponsored-by: rsync.net Differential Revision: D9235 Reviewed-by: adrian
# fcf59617	06-Feb-2017	Andrey V. Elsukov <ae@FreeBSD.org>	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352
# f3e7afe2	18-Jan-2017	Hans Petter Selasky <hselasky@FreeBSD.org>	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months
# aec9c8d5	17-Oct-2016	George V. Neville-Neil <gnn@FreeBSD.org>	Limit the number of mbufs that can be allocated for IPV6_2292PKTOPTIONS (and IPV6_PKTOPTIONS). PR: 100219 Submitted by: Joseph Kong MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D5157
# cc94f0c2	13-Oct-2016	Gleb Smirnoff <glebius@FreeBSD.org>	- Revert r300854, r303657 which tried to fix regression from r297225. - Fix the regression proper way using RO_RTFREE(). Submitted by: ae
# c3bef61e	15-Sep-2016	Kevin Lo <kevlo@FreeBSD.org>	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878
# 0f5687f2	23-Aug-2016	Mike Karels <karels@FreeBSD.org>	Fix L2 caching for UDP over IPv6 ip6_output() was missing cache invalidation code analougous to ip_output.c. r304545 disabled L2 caching for UDP/IPv6 as a workaround. This change adds the missing cache invalidation code and reverts r304545. Reviewed by: gnn Approved by: gnn (mentor) Tested by: peter@, Mike Andrews MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D7591
# 723758b7	01-Aug-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Fix NULL pointer dereference. ro pointer can be NULL when IPSec consumes mbuf. PR: 211486 MFC after: 3 days
# d4c22202	01-Aug-2016	Andrew Gallatin <gallatin@FreeBSD.org>	Rework IPV6 TCP path MTU discovery to match IPv4 - Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput() - Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output() to send TCP packets without looking at the tcp host cache for every single transmit. - Make the icmp6 code mimic the IPv4 code & avoid returning PRC_HOSTDEAD because it is so expensive. Without these changes in place, every TCP6 pmtu discovery or host unreachable ICMP resulted in a call to in6_pcbnotify() which walks the tcbinfo table with the write lock held. Because the tcbinfo table is shared between IPv4 and IPv6, this causes huge scalabilty issues on servers with lots of (~100K) TCP connections, to the point where even a small percent of IPv6 traffic had a disproportionate impact on overall throughput. Reviewed by: bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D7272
# be393491	13-Jul-2016	Dimitry Andric <dim@FreeBSD.org>	Fix a page fault in ip6_setpktopt(), occurring when the pflog module is loaded, and syncthing is started, which uses setsockopt(IPV6_PKGINFO). This is because pflog interfaces do not normally have an IPv6 address, causing the ND_IFINFO() macro to dereference a NULL pointer. Reviewed by: ae PR: 210943 MFC after: 3 days
# 4c105402	08-Jun-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Cleanup unneded include "opt_ipfw.h". It was used for conditional build IPFIREWALL_FORWARD support. But IPFIREWALL_FORWARD option was removed a long time ago.
# 6d768226	02-Jun-2016	George V. Neville-Neil <gnn@FreeBSD.org>	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262
# 6351b385	27-May-2016	Gleb Smirnoff <glebius@FreeBSD.org>	Plug route reference underleak that happens with FLOWTABLE after r297225. Submitted by: Mike Karels <mike karels.net>
# 1d4d43c0	19-May-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Remove ip6 adjusting from the place where pointer couldn't be changed. And add comment after calling PFIL hooks, where it could be changed.
# 9fdab4c0	19-May-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Remove ip6 pointer initialization and strange check from the beginning of ip6_output(). It isn't used until the first time adjusted. Remove the comment about adjusting where it is actually initialized.
# 5e0a6f31	19-May-2016	Mark Johnston <markj@FreeBSD.org>	Move IPv6 malloc tag definitions into the IPv6 code.
# f0937b2c	18-May-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Since PFIL can change destination address, use its always actual value from mbuf when calculating path mtu. Remove now unused finaldst variable. Also constify dst argument in ip6_getpmtu() and ip6_getpmtu_ctl(). Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC
# 4ee7e5a6	17-May-2016	Andrey V. Elsukov <ae@FreeBSD.org>	Call RO_RTFREE() when we have detected the change of destination address, otherwise the old route will be used with new destination. MFC after: 1 week
# 155d72c4	15-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/net* : for pointers replace 0 with NULL. Mostly cosmetical, no functional change. Found with devel/coccinelle.
# 84cc0778	24-Mar-2016	George V. Neville-Neil <gnn@FreeBSD.org>	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306
# 56a5f52e	29-Feb-2016	Gleb Smirnoff <glebius@FreeBSD.org>	New way to manage reference counting of mbuf external storage. The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used. The first mbuf to attach a cluster stores the refcount. The further mbufs to reference the cluster point at refcount in the first mbuf. The first mbuf is freed only when the last reference is freed. The benefit over refcounts stored in separate slabs is that now refcounts of different, unrelated mbufs do not share a cache line. For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd() becomes void, making widely used M_EXTADD macro safe. For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization exactly against the cache aliasing problem with regular refcounting. Discussed with: rrs, rwatson, gnn, hiren, sbruno, np Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D5396 Sponsored by: Netflix
# bacf6684	04-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Finish r293098: make ip6_getpmtu() and ip6_getpmtu_ctl() use new routing API
# 0d4df029	03-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu(). Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae
# 912568c8	30-Dec-2015	Jonathan T. Looney <jtl@FreeBSD.org>	Add the appropriate case statement for IPV6_BINDMULTI so the option can be retrieved with getsockopt(). CID: 1229928 Differential Revision: https://reviews.freebsd.org/D4737 Reviewed by: adrian Sponsored by: Juniper Networks
# 637670e7	15-Nov-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Bring back the ability of passing cached route via nd6_output_ifp().
# 1fe201c3	16-Sep-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469
# 68bb8d62	06-Sep-2015	Adrian Chadd <adrian@FreeBSD.org>	Add support for receiving flowtype, flowid and RSS bucket information as part of recvmsg(). Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3562
# 331dff07	08-Aug-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback
# cb207f93	03-Jul-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Keep IPv6 address specified by IPV6_PKTINFO socket option in kernel internal form to be able handle link-local IPv6 addresses. Reported by: kp Tested by: kp
# 654bdb5a	07-May-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Mark data checksum as valid for multicast packets, that we send back to myself via simloop. Also remove duplicate check under #ifdef DIAGNOSTIC. PR: 180065 MFC after: 1 week
# 79831849	31-Mar-2015	Kristof Provost <kp@FreeBSD.org>	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)
# 8f1beb88	04-Mar-2015	Andrey V. Elsukov <ae@FreeBSD.org>	Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC
# 6c269f69	15-Feb-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4). Submitted by: Kristof Provost Differential Revision: D1766
# e5ee7060	15-Feb-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Move ip6_deletefraghdr() to frag6.c. Suggested by: bz
# 0b438b0f	15-Feb-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Factor out ip6_deletefraghdr() function, to be shared between IPv6 stack and pf(4). Submitted by: Kristof Provost Reviewed by: ae Differential Revision: D1764
# b2bdc62a	18-Jan-2015	Adrian Chadd <adrian@FreeBSD.org>	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)
# ffec6ee5	12-Jan-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Do not go one layer down to check ifqueue length. First, not all drivers use ifqueue at all. Second, there is no point in this lockless check. Either positive or negative result of the check could be incorrect after a tick. Sponsored by: Nginx, Inc.
# ed6a66ca	05-Jan-2015	Robert Watson <rwatson@FreeBSD.org>	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division
# 0275b2e3	11-Dec-2014	Andrey V. Elsukov <ae@FreeBSD.org>	Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC
# c2529042	01-Dec-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies
# f9723c77	20-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.
# 7f948f12	16-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125
# 603eaf79	09-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@
# 257480b8	04-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert netinet6/ to use new routing API. * Remove &ifpp from ip6_output() in favor of ri->ri_nh_info * Provide different wrappers to in6_selectsrc: Currently it is used by 2 differenct type of customers: - socket-based one, which all are unsure about provided address scope and - in-kernel ones (ND code mostly), which don't have any sockets, options, crededentials, etc. So, we provide two different wrappers to in6_selectsrc() returning select source. * Make different versions of selectroute(): Currenly selectroute() is used in two scenarios: - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute() - output, via in6_output -> wrapper -> selectroute() Provide different versions for each customer: - fib6_lookup_nh_basic()-based in6_selectif() which is capable of returning interface only, without MTU/NHOP/L2 calculations - full-blown fib6_selectroute() with cached route/multipath/ MTU/L2 * Stop using routing table for link-local address lookups * Add in6_ifawithifp_lla() to make for-us check faster for link-local * Add in6_splitscope / in6_setllascope for faster embed/deembed scopes
# f0cace5d	12-Oct-2014	Robert Watson <rwatson@FreeBSD.org>	When deciding whether to call m_pullup() even though there is adequate data in an mbuf, use M_WRITABLE() instead of a direct test of M_EXT; the latter both unnecessarily exposes mbuf-allocator internals in the protocol stack and is also insufficient to catch all cases of non-writability. (NB: m_pullup() does not actually guarantee that a writable mbuf is returned, so further refinement of all of these code paths continues to be required.) Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900
# 9c57a5b6	01-Oct-2014	Hiroki Sato <hrs@FreeBSD.org>	Add an additional routing table lookup when m->m_pkthdr.fibnum is changed at a PFIL hook in ip{,6}_output(). IPFW setfib rule did not perform a routing table lookup when the destination address was not changed. CR: D805
# 9196891f	10-Sep-2014	Andrey V. Elsukov <ae@FreeBSD.org>	Add additional checks for IPV6_PKTINFO handling (RFC 3542): * Return ENETDOWN when interface specified by ipi6_ifindex is not enabled for IPv6 use. * Return EADDRNOTAVAIL when ipi6_ifindex specifies an interface, but the address ipi6_addr is not available for use on that interface. * Return EINVAL when ipi6_addr is multicast address. Obtained from: Yandex LLC Sponsored by: Yandex LLC
# b174de32	08-Sep-2014	Adrian Chadd <adrian@FreeBSD.org>	Add IP_NODEFAULTFLOWID awareness to ip6_output(). Differential Revision: https://reviews.freebsd.org/D527
# c7c0d948	11-Jul-2014	Adrian Chadd <adrian@FreeBSD.org>	Add IPv6 flowid, bindmulti and RSS awareness.
# aaf2cfc0	27-May-2014	VANHULLEBUS Yvan <vanhu@FreeBSD.org>	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ
# e3a7aa6f	04-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 0ff96b4f	17-Feb-2014	Gleb Smirnoff <glebius@FreeBSD.org>	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 5d6d7e75	07-Feb-2014	Gleb Smirnoff <glebius@FreeBSD.org>	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 76039bc8	26-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 7caf4ab7	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 3fa98cf9	15-Oct-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Remove unsigned < 0 check.
# 1b4381af	24-Aug-2013	Andre Oppermann <andre@FreeBSD.org>	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation
# efdf104b	04-Jul-2013	Mikolaj Golub <trociny@FreeBSD.org>	In r227207, to fix the issue with possible NULL inp_socket pointer dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR for multicast), INP_REUSEPORT flag was introduced to cache the socket option. It was decided then that one flag would be enough to cache both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR setsockopt(2), it was checked if it was called for a multicast address and INP_REUSEPORT was set accordingly. Unfortunately that approach does not work when setsockopt(2) is called before binding to a multicast address: the multicast check fails and INP_REUSEPORT is not set. Fix this by adding INP_REUSEADDR flag to unconditionally cache SO_REUSEADDR. PR: 179901 Submitted by: Michael Gmelin freebsd grem.de (initial version) Reviewed by: rwatson MFC after: 1 week
# 4871fc4a	16-May-2013	Julian Elischer <julian@FreeBSD.org>	Finally change the mbuf to have its own fib field instead of stealing 4 flag bits. This was supposed to happen in 8.0, and again in 2012.. MFC after: never
# 9cb8d207	09-Apr-2013	Andrey V. Elsukov <ae@FreeBSD.org>	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week
# 10e5acc3	15-Mar-2013	Gleb Smirnoff <glebius@FreeBSD.org>	- Use m_getcl() instead of hand allocating. - Do not calculate constant length values at run time, CTASSERT() their sanity. - Remove superfluous cleaning of mbuf fields after allocation. - Replace compat macros with function calls. Sponsored by: Nginx, Inc.
# 7b07d1be	14-Mar-2013	Gleb Smirnoff <glebius@FreeBSD.org>	- Use m_getcl() instead of hand allocating. - Use m_get()/m_gethdr() instead of macros. - Remove superfluous cleaning of mbuf fields after allocation. Sponsored by: Nginx, Inc.
# f8fe3dc9	19-Dec-2012	Andrey V. Elsukov <ae@FreeBSD.org>	When we have some address to forward (e.g. it was specified with ipfw fwd), we should pass it as first argument into in6_selectroute_fib function to initiate new route lookup. MFC after: 1 week
# 16607317	19-Dec-2012	Andrey V. Elsukov <ae@FreeBSD.org>	Make dst_sa initialization only when it is actually needed. MFC after: 1 week
# 61d88f34	19-Dec-2012	Andrey V. Elsukov <ae@FreeBSD.org>	The selectroute functions does own account of EHOSTUNREACH errors, no need to do it twice. MFC after: 1 week
# eb1b1807	05-Dec-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
# ffdbf9da	01-Nov-2012	Andrey V. Elsukov <ae@FreeBSD.org>	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre
# c1de64a4	25-Oct-2012	Andrey V. Elsukov <ae@FreeBSD.org>	Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks
# 6f56329a	22-Oct-2012	Xin LI <delphij@FreeBSD.org>	Remove __P. Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months
# ab16a5bd	19-Aug-2012	Mikolaj Golub <trociny@FreeBSD.org>	In ip6_ctloutput() guard inp_flags modifications with INP_WLOCK. MFC after: 2 weeks
# 3b43b783	31-Jul-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	In case of IPsec he have to do delayed checksum calculations before adding any extension header, or rather before calling into IPsec processing as we may send the packet and not return to IPv6 output processing here. PR: kern/170116 MFC After: 3 days
# 68c99a60	30-Jul-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Improve the should-never-hit printf to ease debugging in case we'd ever hit it again when doing the delayed IPv6 checksum calculations. MFC after: 3 days
# 5dbbe4fd	28-Jul-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	For consistency put the IPsec comment iside the #fidef section. MFC after: 3 days
# bf984051	04-Jul-2012	Gleb Smirnoff <glebius@FreeBSD.org>	When ip_output()/ip6_output() is supplied a struct route *ro argument, it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)
# a6cff10f	30-May-2012	Michael Tuexen <tuexen@FreeBSD.org>	Seperate SCTP checksum offloading for IPv4 and IPv6. While there: remove some trainling whitespaces. MFC after: 3 days X-MFC with: 236170
# 356ab07e	28-May-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	It turns out that too many drivers are not only parsing the L2/3/4 headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely. To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well. Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6. This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist. Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags. Individual driver updates will have to follow, as will SCTP. Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958
# c69baa7e	26-May-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Correctly get the payload length in host byte order. While we already plan to support >64k payload here, the IPv6 header payload length obviously is only 16 bit and the calculations need to be right. Reported by: dim Tested by: dim MFC after: 1 day X-MFC: with r235958
# e7b92e27	24-May-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	MFp4 bz_ipv6_fast: Add support for delayed checksum calculations in the IPv6 output path. We currently cannot offload to the card if we add extension headers (which incl. fragmentation). Fix two SCTP offload support copy&paste bugs: calculate checksums if fragmenting and no need to flag IPv4 header checksums in the IPv6 forwarding path. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days
# 5aa7e8ed	24-Feb-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	In selectroute() add a missing fibnum argument to an in6_rtalloc() call in an #if 0 section. In in6_selecthlim() optimize a case where in6p cannot be NULL due to an earlier check. More consistently use u_int instead of int for fibnum function arguments. Sponsored by: Cisco Systems, Inc. MFC after: 3 days
# 81d5d46b	03-Feb-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.
# ee799639	03-Feb-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Add SO_SETFIB option support on PF_INET6 sockets and allow inheriting the FIB number from the process, as set by setfib(2), on socket creation. Sponsored by: Cisco Systems, Inc.
# fc06cd42	06-Nov-2011	Mikolaj Golub <trociny@FreeBSD.org>	Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid inp_socket->so_options dereference when we may not acquire the lock on the inpcb. This fixes the crash due to NULL pointer dereference in in_pcbbind_setup() when inp_socket->so_options in a pcb returned by in_pcblookup_local() was checked. Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com> Suggested by: rwatson Glanced by: rwatson Tested by: dave jones <s.dave.jones@gmail.com>
# 6090ab8b	19-Sep-2011	Hiroki Sato <hrs@FreeBSD.org>	Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixes inconsistency when options are specified by both setsockopt() and ancillary data types. PR: kern/158307 Approved by: re (bz)
# 8a006adb	20-Aug-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)
# 6d79f3f6	27-Nov-2010	Rebecca Cran <brucec@FreeBSD.org>	Fix more continuous/contiguous typos (cf. r215955)
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 5f6bf451	24-Sep-2010	Attilio Rao <attilio@FreeBSD.org>	IP_BINDANY is not correctly handled in getsockopt() case. Fix it by specifying the correct bits. Sponsored by: Sandvine Incorporated Reviewed by: bz, emaste, rstone Obtained from: Sandvine Incorporated MFC after: 10 days
# 1f93b772	11-May-2010	Kip Macy <kmacy@FreeBSD.org>	try working around panic by validating rt and lle MFC after: 3 days
# 77931dd5	09-May-2010	Kip Macy <kmacy@FreeBSD.org>	Add flowtable support to IPv6 Tested by: qingli@ Reviewed by: qingli@ MFC after: 3 days
# 54bb4167	05-Apr-2010	Randall Stewart <rrs@FreeBSD.org>	MFC of 2 items to fix the csum for v6 issue: Revision 205075 and 205104: ---------205075---------- With the recent change of the sctp checksum to support offload, no delayed checksum was added to the ip6 output code. This causes cards that do not support SCTP checksum offload to have SCTP packets that are IPv6 NOT have the sctp checksum performed. Thus you could not communicate with a peer. This adds the missing bits to make the checksum happen for these cards. ------------------------- ---------205104---------- The proper fix for the delayed SCTP checksum is to have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). ------------------------- PR: 144529
# 1966e5b5	12-Mar-2010	Randall Stewart <rrs@FreeBSD.org>	The proper fix for the delayed SCTP checksum is to have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). PR: 144529 MFC after: 2 weeks
# 9b03990a	12-Mar-2010	Randall Stewart <rrs@FreeBSD.org>	With the recent change of the sctp checksum to support offload, no delayed checksum was added to the ip6 output code. This causes cards that do not support SCTP checksum offload to have SCTP packets that are IPv6 NOT have the sctp checksum performed. Thus you could not communicate with a peer. This adds the missing bits to make the checksum happen for these cards. PR: 144529 MFC after: 2 weeks
# 2ae7ec29	07-Feb-2010	Julian Elischer <julian@FreeBSD.org>	MFC of 197952 and 198075 Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. and Unbreak the VIMAGE build with IPSEC, broken with r197952 by virtualizing the pfil hooks. For consistency add the V_ to virtualize the pfil hooks in here as well.
# 0b4b0b0f	10-Oct-2009	Julian Elischer <julian@FreeBSD.org>	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months
# 3d2a8d36	05-Sep-2009	Qing Li <qingli@FreeBSD.org>	MFC r196864 This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz Approved by: re
# 9452b0d2	05-Sep-2009	Qing Li <qingli@FreeBSD.org>	This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately
# 530c0060	01-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
# eddfbb76	14-Jul-2009	Robert Watson <rwatson@FreeBSD.org>	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
# 8c0fec80	23-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
# 8d8bc018	08-Jun-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.
# f44270e7	01-Jun-2009	Pawel Jakub Dawidek <pjd@FreeBSD.org>	- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option as there is no more space for more SOL_SOCKET options, but this option also fits better as an IP socket option, it seems. - Implement this functionality also for IPv6 and RAW IP sockets. - Always compile it in (don't use additional kernel options). - Remove sysctl to turn this functionality on and off. - Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this functionality (currently only unjail root can use it). Discussed with: julian, adrian, jhb, rwatson, kmacy
# 71ce264c	09-May-2009	Warner Losh <imp@FreeBSD.org>	Implement RFC 5095 more fully. Rather than marking this no-op code as BURN_BRIDGES, just remove it. Adjust comments. Reviewed by: dwhite, emaste, battlez
# 33cde130	29-Apr-2009	Bruce M Simpson <bms@FreeBSD.org>	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.
# 1263305f	03-Mar-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Start removing IPv6 Type 0 Routing header code. RH0 was deprecated by RFC 5095. While most of the code had been disabled by #if 0 already, leave a bit of infrastructure for possible RH2 code and a log message under BURN_BRIDGES in case a user still tries to send RH0 packets. Reviewed by: gnn (a bit back, earlier version)
# 33553d6e	27-Feb-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.
# 97aa4a51	08-Feb-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Try to remove/assimilate as much of formerly IPv4/6 specific (duplicate) code in sys/netipsec/ipsec.c and fold it into common, INET/6 independent functions. The file local functions ipsec4_setspidx_inpcb() and ipsec6_setspidx_inpcb() were 1:1 identical after the change in r186528. Rename to ipsec_setspidx_inpcb() and remove the duplicate. Public functions ipsec[46]_get_policy() were 1:1 identical. Remove one copy and merge in the factored out code from ipsec_get_policy() into the other. The public function left is now called ipsec_get_policy() and callers were adapted. Public functions ipsec[46]_set_policy() were 1:1 identical. Rename file local ipsec_set_policy() function to ipsec_set_policy_internal(). Remove one copy of the public functions, rename the other to ipsec_set_policy() and adapt callers. Public functions ipsec[46]_hdrsiz() were logically identical (ignoring one questionable assert in the v6 version). Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(), the public function to ipsec_hdrsiz(), remove the duplicate copy and adapt the callers. The v6 version had been unused anyway. Cleanup comments. Public functions ipsec[46]_in_reject() were logically identical apart from statistics. Move the common code into a file local ipsec46_in_reject() leaving vimage+statistics in small AF specific wrapper functions. Note: unfortunately we already have a public ipsec_in_reject(). Reviewed by: sam Discussed with: rwatson (renaming to *_internal) MFC after: 26 days X-MFC: keep wrapper functions for public symbols?
# 97590249	17-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks
# dcdb4371	16-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks
# fc384fa5	15-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb(). Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change. Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)
# 6e6b3f7c	14-Dec-2008	Qing Li <qingli@FreeBSD.org>	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion
# 4b79449e	02-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 6f4da201	15-Oct-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Check that the mbuf len is positive (like we do in the v4 case). Read the other way round this means that even with the checks the m_len turned negative in some cases which led to panics. The reason to my understanding seems to be that the checks are wrong (also for v4) ignoring possible padding when checking cmsg_len or padding after data when adjusting the mbuf. Doing proper cheks seems to break applications like named so further investigation and regression tests are needed. PR: kern/119123 Tested by: Ashish Shukla wahjava gmail.com MFC after: 3 days
# 8b615593	02-Oct-2008	Marko Zec <zec@FreeBSD.org>	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
# 603724d3	17-Aug-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
# cc29ac7d	29-Jul-2008	Robert Watson <rwatson@FreeBSD.org>	Marginally decomplicate set/getsockopt code in ip6_output.c by simply using the passed arguments explicitly and unconditionally rather than testing them and calling panic(). The result is the same but easier to read. MFC after: 3 days
# ea26d587	25-Mar-2008	Ruslan Ermilov <ru@FreeBSD.org>	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
# 9e3bdede	14-Mar-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Correct IPsec behaviour with a 'use' level in SP but no SA available. In that case return an continue processing the packet without IPsec. PR: 121384 MFC after: 5 days Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]
# 8cfbd299	14-Mar-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Correct reference counting on the SP for outgoing IPv6 IPsec connections. PR: 121374 Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) MFC after: 5 days
# 41aa71dd	14-Mar-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Replace the function name in two identical printfs by __func__, __LINE__ so we can distinguish them when people report a problem. PR: 121373 MFC after: 5 days
# c26fe973	02-Feb-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than passing around a cached 'priv', pass in an ucred to ipsec_set_policy and do the privilege check only if needed. Try to assimilate both ip_ctloutput code blocks calling ipsec*_set_policy. Reviewed by: rwatson
# 79ba3952	24-Jan-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)
# 9233d8f3	08-Jan-2008	David E. O'Brien <obrien@FreeBSD.org>	un-__P()
# b48287a3	10-Dec-2007	David E. O'Brien <obrien@FreeBSD.org>	Clean up VCS Ids.
# 016fb9d9	21-Nov-2007	Mike Makonnen <mtm@FreeBSD.org>	Instead of manually freeing the packet options structure (and not even doing a good job of it) in the copypktopts() function, just call ip6_clearpktopts() directly. Otherwise, the callers of this function would end up freeing the memory twice. Reviewed by: jinmei PR: kern/116360
# 2a463222	05-Jul-2007	Xin LI <delphij@FreeBSD.org>	Space cleanup Approved by: re (rwatson)
# 1272577e	05-Jul-2007	Xin LI <delphij@FreeBSD.org>	ANSIfy[1] plus some style cleanup nearby. Discussed with: gnn, rwatson Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1] Approved by: re (rwatson)
# b2630c29	02-Jul-2007	George V. Neville-Neil <gnn@FreeBSD.org>	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing
# 2cb64cb2	01-Jul-2007	George V. Neville-Neil <gnn@FreeBSD.org>	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing
# c2259ba4	13-Jun-2007	Robert Watson <rwatson@FreeBSD.org>	Include priv.h to pick up suser(9) definitions, missed in an earlier commit. Warnings spotted by: kris
# 43bc7a9c	04-Aug-2006	Brooks Davis <brooks@FreeBSD.org>	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.
# 656faadc	12-May-2006	Max Laier <mlaier@FreeBSD.org>	Remove ip6fw. Since ipfw has full functional IPv6 support now and - in contrast to ip6fw - is properly lockes, it is time to retire ip6fw.
# 604afec4	01-Feb-2006	Christian S.J. Peron <csjp@FreeBSD.org>	Somewhat re-factor the read/write locking mechanism associated with the packet filtering mechanisms to use the new rwlock(9) locking API: - Drop the variables stored in the phil_head structure which were specific to conditions and the home rolled read/write locking mechanism. - Drop some includes which were used for condition variables - Drop the inline functions, and convert them to macros. Also, move these macros into pfil.h - Move pfil list locking macros intp phil.h as well - Rename ph_busy_count to ph_nhooks. This variable will represent the number of IN/OUT hooks registered with the pfil head structure - Define PFIL_HOOKED macro which evaluates to true if there are any hooks to be ran by pfil_run_hooks - In the IP/IP6 stacks, change the ph_busy_count comparison to use the new PFIL_HOOKED macro. - Drop optimization in pfil_run_hooks which checks to see if there are any hooks to be ran, and returns if not. This check is already performed by the IP stacks when they call: if (!PFIL_HOOKED(ph)) goto skip_hooks; - Drop in assertion which makes sure that the number of hooks never drops below 0 for good measure. This in theory should never happen, and if it does than there are problems somewhere - Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep - Drop variables which support home rolled read/write locking mechanism from the IPFW firewall chain structure. - Swap out the read/write firewall chain lock internal to use the rwlock(9) API instead of our home rolled version - Convert the inlined functions to macros Reviewed by: mlaier, andre, glebius Thanks to: jhb for the new locking API
# fc4c8258	13-Jan-2006	Robert Watson <rwatson@FreeBSD.org>	When storing the results of malloc() in a pointer to a pointer, check the pointer to a pointer for NULL, not the pointer for NULL. Noticed by: Coverity Prevent analysis tool MFC after: 3 days
# 743eee66	21-Oct-2005	SUZUKI Shinsuke <suz@FreeBSD.org>	sync with KAME regarding NDP - introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners - supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt> - better prefix lifetime management - more spec-comformant DAD advertisement - updated RFC/internet-draft revisions Obtained from: KAME Reviewed by: ume, gnn MFC after: 2 month
# 4ecbe331	21-Oct-2005	SUZUKI Shinsuke <suz@FreeBSD.org>	sync with KAME (renamed a macro IPV6_DADOUTPUT to IPV6_UNSPECSRC) Obtained from: KAME
# 7ba26d99	07-Sep-2005	David E. O'Brien <obrien@FreeBSD.org>	IPv6 was improperly defining its malloc type the same as IPv4 (M_IPMADDR, M_IPMOPTS, M_MRTABLE). Thus we had conflicting instantiations. Create an IPv6-specific type to overcome this.
# e0aec682	30-Aug-2005	Andre Oppermann <andre@FreeBSD.org>	Use the correct mbuf type for MGET().
# e770771a	28-Jul-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	simplied the fix to FreeBSD-SA-04:06.ipv6. The previous one worried too much even though we actually validate the parameters. This code also is more compatible with other *BSDs, which do copyin within setsockopt(). Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Reviewed by: security-officer (nectar) Obtained from: KAME
# a1f7e5f8	24-Jul-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	scope cleanup. with this change - most of the kernel code will not care about the actual encoding of scope zone IDs and won't touch "s6_addr16[1]" directly. - similarly, most of the kernel code will not care about link-local scoped addresses as a special case. - scope boundary check will be stricter. For example, the current BSD code allows a packet with src=::1 and dst=(some global IPv6 address) to be sent outside of the node, if the application do: s = socket(AF_INET6); bind(s, "::1"); sendto(s, some_global_IPv6_addr); This is clearly wrong, since ::1 is only meaningful within a single node, but the current implementation of the BSD kernel cannot reject this attempt. Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp> Obtained from: KAME
# 885adbfa	21-Jul-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	always copy ip6_pktopt. remove needcopy and needfree argument/structure member accordingly. Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME
# d5e3406d	21-Jul-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	be consistent on naming advanced API functions; use ip6_XXXpktopt(s). Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME
# 8507acb1	21-Jul-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	NULL is not zero. Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME
# 18b35df8	20-Jul-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	update comments: - RFC2292bis -> RFC3542 - typo fixes Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME
# fc74a9f9	10-Jun-2005	Brooks Davis <brooks@FreeBSD.org>	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam
# 403cbcf5	14-May-2005	George V. Neville-Neil <gnn@FreeBSD.org>	Fixes for various nits found by the Coverity tool. In particular 2 missed return values and an inappropriate bcopy from a possibly NULL pointer. Reviewed by: jake Approved by: rwatson MFC after: 1 week
# 8195404b	18-Apr-2005	Brooks Davis <brooks@FreeBSD.org>	Add IPv6 support to IPFW and Dummynet. Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
# 283f9f8a	27-Feb-2005	Hajimu UMEMOTO <ume@FreeBSD.org>	initialized the last arg to ip6_process_hopopts(), because the recent code requires it to be 0 when a jumbo payload option is contained. PR: kern/77934 Submitted by: Gerd Rausch <gerd@juniper.net> Obtained from: KAME MFC after: 2 days
# caf43b02	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes, separate for KAME
# 763f534e	02-Oct-2004	Doug White <dwhite@FreeBSD.org>	Disable MTU feedback in IPv6 if the sender writes data that must be fragmented. Discussed extensively with KAME. The API author's intent isn't clear at this point, so rather than remove the code entirely, #if 0 out and put a big comment in for now. The IPV6_RECVPATHMTU sockopt is available if the application wants to be notified of the path MTU to optimize packet sizes. Thanks to JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp> for putting up with my incessant badgering on this issue, and fenner for pointing out the API issue and suggesting solutions.
# d6a8d588	28-Sep-2004	Max Laier <mlaier@FreeBSD.org>	Add an additional struct inpcb * argument to pfil(9) in order to enable passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does not fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit. This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future. Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days LOR IDs: 14 - 17 (not fixed yet)
# c21fd232	27-Aug-2004	Andre Oppermann <andre@FreeBSD.org>	Always compile PFIL_HOOKS into the kernel and remove the associated kernel compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack. If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active. Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
# 1f44b0a1	14-Aug-2004	David Malone <dwmalone@FreeBSD.org>	Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD have already done this, so I have styled the patch on their work: 1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID. 2) named the sysctl net.inet.ip.random_id 3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns. The sysctl defaults to 0 (sequential IP IDs). Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months
# 6f8aee22	13-May-2004	Bill Paul <wpaul@FreeBSD.org>	Fix a bug which I discovered recently while doing IPv6 testing at Wind River. In the IPv4 output path, one of the tests in ip_output() checks how many slots are actually available in the interface output queue before attempting to send a packet. If, for example, we need to transmit a packet of 32K bytes over an interface with an MTU of 1500, we know it's going to take about 21 fragments to do it. If there's less than 21 slots left in the output queue, there's no point in transmitting anything at all: IP does not do retransmission, so sending only some of the fragments would just be a waste of bandwidth. (In an extreme case, if you're sending a heavy stream of fragmented packets, you might find yourself sending nothing by the first fragment of all your packets.) So if ip_output() notices there's not enough room in the output queue to send the frame, it just dumps the packet and returns ENOBUFS to the app. It turns out ip6_output() lacks this code. Consequently, this caused the netperf UDPIPV6_STREAM test to produce very poor results with large write sizes. This commit adds code to check the remaining space in the output queue and junk fragmented packets if they're too big to be sent, just like with IPv4. (I can't imagine anyone's running an NFS server using UDP over IPv6, but if they are, this will likely make them a lot happier. :)
# f36cfd49	07-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
# a5d1aae3	26-Mar-2004	Hajimu UMEMOTO <ume@FreeBSD.org>	Validate IPv6 socket options more carefully to avoid a panic. PR: kern/61513 Reviewed by: cperciva, nectar
# da0f4099	17-Feb-2004	Hajimu UMEMOTO <ume@FreeBSD.org>	IPSEC and FAST_IPSEC have the same internal API now; so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
# 8b00e59d	08-Feb-2004	Hajimu UMEMOTO <ume@FreeBSD.org>	- obey ip6po_minmtu. - notify a proper path MTU to applications. Obtained from: KAME
# f073c60f	03-Feb-2004	Hajimu UMEMOTO <ume@FreeBSD.org>	pass pcb rather than so. it is expected that per socket policy works again.
# a46f7e7c	23-Dec-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	Catch a few places where NULL (pointer) was used where 0 (integer) was expected (fix build).
# a89ec05e	22-Dec-2003	Peter Wemm <peter@FreeBSD.org>	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.
# aef03e95	21-Dec-2003	SUZUKI Shinsuke <suz@FreeBSD.org>	fixed a bug that IPv6 routing header does not work properly if specified from userland application reviewed by: ume
# 03a1bc3e	16-Dec-2003	SUZUKI Shinsuke <suz@FreeBSD.org>	fixed an IPv6 path MTU discovery failure owing to a lack of initialization Reviewed by: ume Approved by: re (scottl) MFC after: 1 day
# 289b28bd	23-Nov-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	pktopt may be null. Approved by: re (rwatson)
# 97d8d152	20-Nov-2003	Andre Oppermann <andre@FreeBSD.org>	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
# e5f467a2	17-Nov-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	correct to look right interface.
# 7138d65c	08-Nov-2003	Sam Leffler <sam@FreeBSD.org>	replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF macros that expand to include assertions when the system is built with INVARIANTS Supported by: FreeBSD Foundation
# 07027f9d	06-Nov-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	correct behavior when ipv6mr_interface is 0. Matthias Drochner Notified by: itojun Obtained from: NetBSD
# 0f9ade71	04-Nov-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	- cleanup SP refcnt issue. - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard\|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec__policy -> ipsec__pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy )pcb_sp, spp (struct secpolicy ), sp (struct secpolicy ) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME
# 29bc2c48	31-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	do not insert a dest option header (even specified by a user) that should be placed before a routing header, unless a routing header really exists. Obtained from: KAME
# 02b9a206	26-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	re-add wrongly disappered IPV6_CHECKSUM stuff by introducing ip6_raw_ctloutput(). Obtained from: KAME
# c302f5bc	24-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{} according to rfc2292bis. Obtained from: KAME
# f95d4633	24-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542 (aka RFC2292bis). Though I believe this commit doesn't break backward compatibility againt existing binaries, it breaks backward compatibility of API. Now, the applications which use Advanced Sockets API such as telnet, ping6, mld6query and traceroute6 use RFC3542 API. Obtained from: KAME
# 9a4f9608	21-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	- change scope to zone. - change node-local to interface-local. - better error handling of address-to-scope mapping. - use in6_clearscope(). Obtained from: KAME
# 31b3783c	20-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	correct linkmtu handling. Obtained from: KAME
# 31b1bfe1	17-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	- add dom_if{attach,detach} framework. - transition to use ifp->if_afdata. Obtained from: KAME
# 953ad2fb	10-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	nuke SCOPEDROUTING. Though it was there for a long time, it was never enabled.
# 7efe5d92	08-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	- fix typo in comments. - style. - NULL is not 0. - some variables were renamed. - nuke unused logic. (there is no functional change.) Obtained from: KAME
# 68974f29	07-Oct-2003	Sam Leffler <sam@FreeBSD.org>	must lock route when the caller provided a route but not an interface; otherwise the subsequent unlock blows up Suffered by: Marcel Moolenaar <marcel@xcllnt.net> Supported by: FreeBSD Foundation
# 40e39bbb	06-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	return(code) -> return (code) (reduce diffs against KAME)
# d1dd20be	03-Oct-2003	Sam Leffler <sam@FreeBSD.org>	Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
# 29234943	01-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	Obey RANDOM_IP_ID. Requested by: sam
# 8373d51d	01-Oct-2003	Hajimu UMEMOTO <ume@FreeBSD.org>	randomize IPv6 fragment ID. Obtained from: KAME
# b140bc1f	29-Sep-2003	Sam Leffler <sam@FreeBSD.org>	Correct pfil_run_hooks return handling: if the return value is non-zero then the mbuf has been consumed by a hook; otherwise beware of a null mbuf return (gack). In particular the bridge was doing the wrong thing. While in the ipv6 code make it's handling of pfil_run_hooks identical to netbsd. Pointed out by: Pyun YongHyeon <yongari@kt-is.co.kr>
# 134ea224	23-Sep-2003	Sam Leffler <sam@FreeBSD.org>	o update PFIL_HOOKS support to current API used by netbsd o revamp IPv4+IPv6+bridge usage to match API changes o remove pfil_head instances from protosw entries (no longer used) o add locking o bump FreeBSD version for 3rd party modules Heavy lifting by: "Max Laier" <max@love2party.net> Supported by: FreeBSD Foundation Obtained from: NetBSD (bits of pfil.h and pfil.c)
# 8608c4c1	20-Feb-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Remove unused variables in the IPSEC case. Submitted by: Lars Eggert <larse@ISI.EDU>
# 340c35de	19-Feb-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 0e7dea83	06-Jan-2003	Sam Leffler <sam@FreeBSD.org>	purge extraneous clears of M_PKTHDR since M_MOVE_PKTHDR does this already
# 9967cafc	30-Dec-2002	Sam Leffler <sam@FreeBSD.org>	Correct mbuf packet header propagation. Previously, packet headers were sometimes propagated using M_COPY_PKTHDR which actually did something between a "move" and a "copy" operation. This is replaced by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it from the source mbuf) and m_dup_pkthdr which copies the packet header contents including any m_tag chain. This corrects numerous problems whereby mbuf tags could be lost during packet manipulations. These changes also introduce arguments to m_tag_copy and m_tag_copy_chain to specify if the tag copy work should potentially block. This introduces an incompatibility with openbsd which we may want to revisit. Note that move/dup of packet headers does not handle target mbufs that have a cluster bound to them. We may want to support this; for now we watch for it with an assert. Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG\|M_LASTFRAG. Supported by: Vernier Networks Reviewed by: Robert Watson <rwatson@FreeBSD.org>
# 35f6695b	31-Oct-2002	Hajimu UMEMOTO <ume@FreeBSD.org>	plugged memory leakage in some erroneous cases Obtained from: KAME MFC after: 1 week
# b9234faf	15-Oct-2002	Sam Leffler <sam@FreeBSD.org>	Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
# 5d846453	15-Oct-2002	Sam Leffler <sam@FreeBSD.org>	Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
# 4a6a94d8	22-Aug-2002	Archie Cobbs <archie@FreeBSD.org>	Replace (ab)uses of "NULL" where "0" is really meant.
# 12253795	24-Jul-2002	Hajimu UMEMOTO <ume@FreeBSD.org>	make sure to set/unset INP_IPV4 according to a value of IN6P_IPV6_V6ONLY Reviewed by: Keiichi SHIMA <keiichi@iij.ad.jp>
# 854d3b19	22-Jul-2002	Hajimu UMEMOTO <ume@FreeBSD.org>	do not refer to IN6P_BINDV6ONLY anymore. Obtained from: KAME MFC after: 1 week
# 88ff5695	18-Apr-2002	SUZUKI Shinsuke <suz@FreeBSD.org>	just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD. (based on freebsd4-snap-20020128) Reviewed by: ume MFC after: 1 week
# 44731cab	01-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
# 1183d014	29-Mar-2002	Hajimu UMEMOTO <ume@FreeBSD.org>	Fix cached route problem. Submitted by: Keiichi SHIMA <keiichi@iij.ad.jp> (KAME) Reviewed by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp> (KAME) MFC after: 1 week
# 72b1d826	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove duplicate extern declarations to silence warnings.
# d49d0ca7	09-Dec-2001	Andrew R. Reiter <arr@FreeBSD.org>	- Replace M_WAIT with M_TRYWAIT since the M_WAIT flag is deprecated. Spotted by: bde
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# f9132ceb	05-Sep-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Wrap array accesses in macros, which also happen to be lvalues: ifnet_addrs[i - 1] -> ifaddr_byindex(i) ifindex2ifnet[i] -> ifnet_byindex(i) This is intended to ease the conversion to SMPng.
# 89349143	08-Jul-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	soopt_mcopyout() frees mbuf if error occurs, and DOES NOT free it if it is successful. This part was lacked during merge. Obtained from: KAME MFC after: 1 week
# 3efe99eb	07-Jul-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	The m_free call in the ip6_fw_ctl_ptr == NULL case apparently tries to free uninitialized mbuf. This was my mistake during recent KAME merge. This part is for *BSD other than FreeBSD. Submitted by: Alexander N. Kabaev <ak03@gte.com>
# 0554093b	24-Jun-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	disallow setsockopt(IPV6_V6ONLY) for already bound sockets. Obtained from: KAME MFC after: 10 days
# 3e617560	24-Jun-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	decrease warning Obtained from: KAME MFC after: 10 days
# 99fe1b37	24-Jun-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	Nuke the comment about MIP6. We don't have MIP6 code, yet. MFC after: 10 days
# 33841545	10-Jun-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
# 12ae55c6	23-May-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	Fix memory leak. Submitted by: itojun
# 6c0bea35	20-Jan-2001	Hajimu UMEMOTO <ume@FreeBSD.org>	When ip6_fw_ctl() or soopt_mcopyout() return without success, don't free mbuf. It is already freed by these routins. PR: kern/24248
# 2a0c503e	21-Dec-2000	Bosko Milekic <bmilekic@FreeBSD.org>	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
# cf9fa8e7	29-Oct-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Move suser() and suser_xxx() prototypes and a related #define from <sys/proc.h> to <sys/systm.h>. Correctly document the #includes needed in the manpage. Add one now needed #include of <sys/systm.h>. Remove the consequent 48 unused #includes of <sys/proc.h>.
# fe937674	28-Oct-2000	Josef Karthauser <joe@FreeBSD.org>	Count per-address statistics for IP fragments. Requested by: ru Obtained from: BSD/OS
# 5da9f8fa	19-Oct-2000	Josef Karthauser <joe@FreeBSD.org>	Augment the 'ifaddr' structure with a 'struct if_data' to keep statistics on a per network address basis. Teach the IPv4 and IPv6 input/output routines to log packets/bytes against the network address connected to the flow. Teach netstat to display the per-address stats for IP protocols when 'netstat -i' is evoked, instead of displaying the per-interface stats.
# 20cb9f9e	23-Sep-2000	Hajimu UMEMOTO <ume@FreeBSD.org>	Make ip6fw as loadable module.
# 1469c434	12-Aug-2000	Hajimu UMEMOTO <ume@FreeBSD.org>	Make compilable with -DIPFILTER. Because I don't use ipfilter at all, this is not tested. I don't know if ipfilter is work for IPv6. Submitted by: yoshiaki@kt.rim.or.jp
# c4ac87ea	31-Jul-2000	Darren Reed <darrenr@FreeBSD.org>	activate pfil_hooks and covert ipfilter to use it
# 686cdd19	04-Jul-2000	Jun-ichiro itojun Hagino <itojun@FreeBSD.org>	sync with kame tree as of july00. tons of bug fixes/improvements. API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)
# 06a429a3	24-May-2000	Archie Cobbs <archie@FreeBSD.org>	Just need to pass the address family to if_simloop(), not the whole sockaddr.
# f63e7634	09-Mar-2000	Yoshinobu Inoue <shin@FreeBSD.org>	Initialize mbuf pointer at getting ipsec policy. Without this, kernel will panic at getsockopt() of IPSEC_POLICY. Also make compilable libipsec/test-policy.c which tries getsockopt() of IPSEC_POLICY. Approved by: jkh Submitted by: sakane@kame.net
# 7d0d8dc3	03-Mar-2000	Yoshinobu Inoue <shin@FreeBSD.org>	CMSG_XXX macros alignment fixes to follow RFC2292. Approved by: jkh Submitted by: Partly from tech@openbsd Reviewed by: itojun
# 210d0432	29-Jan-2000	Yoshinobu Inoue <shin@FreeBSD.org>	Add ip6fw. Yes it is almost code freeze, but as the result of many thought, now I think this should be added before 4.0... make world check, kernel build check is done. Reviewed by: green Obtained from: KAME project
# 67e394e0	28-Jan-2000	Yoshinobu Inoue <shin@FreeBSD.org>	Backout diffs which should not be included.
# 577a30ee	27-Jan-2000	Yoshinobu Inoue <shin@FreeBSD.org>	#This is a null commit to give correct description for the previous change. #Please forget the strange log message of the previous commit . IPv6 multicast routing. kernel IPv6 multicast routing support. pim6 dense mode daemon pim6 sparse mode daemon netstat support of IPv6 multicast routing statistics Merging to the current and testing with other existing multicast routers is done by Tatsuya Jinmei <jinmei@kame.net>, who writes and maintainances the base code in KAME distribution. Make world check and kernel build check was also successful. Obtained from: KAME project
# 91ec0a1e	27-Jan-2000	Yoshinobu Inoue <shin@FreeBSD.org>	Sorry I didn't commit these files at the commit just a few minutes before. (IPv6 multicast routing) I think I mistakenly touched TAB and the last arg sys/netinet6 to the cvs commit changed to sys/netinet6/in6_proto.c.
# fa310a7e	12-Jan-2000	Yoshinobu Inoue <shin@FreeBSD.org>	added missing IPV6_PORTRANGE case.
# 6a800098	22-Dec-1999	Yoshinobu Inoue <shin@FreeBSD.org>	IPSEC support in the kernel. pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
# 369dc8ce	21-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Change incorrect NULLs to 0s
# ae5bcbff	09-Dec-1999	Yoshinobu Inoue <shin@FreeBSD.org>	rtcalloc() is removed because it turned out not to be necessary for FreeBSD. (It was added as a part of KAME patch) Specified by: jdp@polstra.com
# cfa1ca9d	07-Dec-1999	Yoshinobu Inoue <shin@FreeBSD.org>	udp IPv6 support, IPv6/IPv4 tunneling support in kernel, packet divert at kernel for IPv6/IPv4 translater daemon This includes queue related patch submitted by jburkhol@home.com. Submitted by: queue related patch from jburkhol@home.com Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
# e1da8747	22-Nov-1999	Yoshinobu Inoue <shin@FreeBSD.org>	Removed IPSEC and IPV6FIREWALL because they are not ready yet.
# 82cd038d	21-Nov-1999	Yoshinobu Inoue <shin@FreeBSD.org>	KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet) With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project