#
ce69e373 |
|
03-Feb-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Revert "sockets: retire sorflush()" Provide a comment in sorflush() why the socket I/O sx(9) lock is actually important. This reverts commit 507f87a799cf0811ce30f0ae7f10ba19b2fd3db3.
|
#
80044c78 |
|
16-Jan-2024 |
Xavier Beaudouin <xavier.beaudouin@klarasystems.com> |
Add UDP encapsulation of ESP in IPv6 This patch provides UDP encapsulation of ESP packets over IPv6. Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c As required by the RFC and unlike in IPv4 encapsulation, UDP checksums are calculated. Co-authored-by: Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu> Sponsored-by: Stormshield Sponsored-by: Wiktel Sponsored-by: Klara, Inc.
|
#
507f87a7 |
|
16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: retire sorflush() With removal of dom_dispose method the function boils down to two meaningful function calls: socantrcvmore() and sbrelease(). The latter is only relevant for protocols that use generic socket buffers. The socket I/O sx(9) lock acquisition in sorflush() is not relevant for shutdown(2) operation as it doesn't do any I/O that may interleave with read(2) or write(2). The socket buffer mutex acquisition inside sbrelease() is what guarantees thread safety. This sx(9) acquisition in soshutdown() can be tracked down to 4.4BSD times, where it used to be sblock(), and it was carried over through the years evolving together with sockets with no reconsideration of why do we carry it over. I can't tell if that sblock() made sense back then, but it doesn't make any today. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43415
|
#
5bba2728 |
|
16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: make pr_shutdown fully protocol specific method Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more self documented, as protocol specific method would execute only the code that is relevant to that protocol and nothing else. This also fixes a couple recent regressions and reduces risk of future regressions. The extended KPI for the new pr_shutdown removes need for the extra pr_flush which was added for the sake of SCTP which could not perform its shutdown properly with the old one. Particularly for SCTP this change streamlines a lot of code. Some notes on why certain parts of code were copied or were not to certain protocols: * The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is needed only for those protocols that may be connected or disconnected. * The above reduces into only SS_ISCONNECTED for those protocols that always connect instantly. * The ENOTCONN and continue processing hack is left only for datagram protocols. * The SOLISTENING(so) block is copied to those protocols that listen(2). * sorflush() on SHUT_RD is copied almost to every protocol, but that will be refactored later. * wakeup(&so->so_timeo) is copied to protocols that can make a non-instant connect(2), can SO_LINGER or can accept(2). There are three protocols (netgraph(4), Bluetooth, SDP) that did not have pr_shutdown, but old soshutdown() would still perform sorflush() on SHUT_RD for them and also wakeup(9). Those protocols partially supported shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay about that and SDP is almost abandoned anyway. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43413
|
#
7df9da47 |
|
14-Dec-2023 |
Richard Kümmel <R.Kuemmel@beckhoff.com> |
Fix udp IPv4-mapped address Do not use the cached route if the destination isn't the same. This fix a problem where an UDP packet will be sent via the wrong route and interface if a previous one was sent via them. PR: 275774 Reviewed by: glebius, tuexen Sponsored by: Beckhoff Automation GmbH & Co. KG
|
#
a13039e2 |
|
27-Dec-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: reoder inpcb destruction First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions. PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
03c3a70a |
|
05-Nov-2023 |
Michael Tuexen <tuexen@FreeBSD.org> |
udplite: make socketoption available on IPv6 sockets This patch allows the IPPROTO_UDPLITE-level socket options UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV to be used on AF_INET6 sockets in addition to AF_INET sockets. Reviewed by: ae, rscheff MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D42430
|
#
aa64a8f5 |
|
01-Nov-2023 |
Michael Tuexen <tuexen@FreeBSD.org> |
udplite: fix checksum computation on the sender side Don't fill the fields of the UDP/IP header not used for the checksum computation before performing the checksum computation. Reviewed by: glebius MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D42275
|
#
abca3ae7 |
|
07-Oct-2023 |
Michael Tuexen <tuexen@FreeBSD.org> |
udp: fix sending of IPv4-mapped addresses The inp_vflags field must be adjusted during the call of in_pcbbind_setup(). This is consistent with the other places in the code, but not elegant at all. PR: 274009 Reported by: syzbot+81ccc423a2737ed031ac@syzkaller.appspotmail.com Reported by: syzbot+c8e3dac881bba85bc029@syzkaller.appspotmail.com Reviewed by: markj, rrs, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D42031
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
96871af0 |
|
15-Feb-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use family specific sockaddr argument for bind functions Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_bind method and from there on go down the call stack with family specific argument. Reviewed by: zlei, melifaro, markj Differential Revision: https://reviews.freebsd.org/D38601
|
#
9e46ff4d |
|
03-Feb-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet: don't return conflicting inpcb in in_pcbconnect_setup() Last time this inpcb was actually used was in tcp_connect() before c94c54e4df9a.
|
#
a9d22cce |
|
03-Feb-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use family specific sockaddr argument for connect functions Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_connect method and from there on go down the call stack with family specific argument. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D38356
|
#
2589ec0f |
|
03-Feb-2023 |
Mark Johnston <markj@FreeBSD.org> |
pcb: Move an assignment into in_pcbdisconnect() All callers of in_pcbdisconnect() clear the local address, so let's just do that in the function itself. Note that the inp's local address is not a parameter to the inp hash functions. No functional change intended. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38362
|
#
1aed3b34 |
|
07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp: add protocol method declarations to udp_var.h They are shared between UDP over IPv4 and over IPv6. To prevent all possible kernel build failures wrap them in #ifdef _SYS_PROTOSW_H_. Prompted by feedback from jhb@ and jrtc27@ on c93db4abf454.
|
#
32920f03 |
|
07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp: inline udp_output() into udp_send()
|
#
483fe965 |
|
07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp: embed inpcb into udpcb See similar change to TCP e68b3792440 for more context. For UDP the change is much simplier, though.
|
#
294a609f |
|
07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp: destroy UDP and UDP-Lite inpcbinfos in single SYSUNINIT They are created in a single SYSINIT, there is no reason to destroy them in separate functions.
|
#
0aa120d5 |
|
02-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: allow to provide protocol specific pcb size The protocol specific structure shall start with inpcb. Differential revision: https://reviews.freebsd.org/D37126
|
#
d00c2088 |
|
30-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
udp[6]_multi_input: Don't unlock freed inp. If udp[6]_append() returns non-zero, it is because the inp has gone away (inpcbrele_rlocked returned 1 after running the tunnel function). Reviewed by: ae Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37511
|
#
fcb3f813 |
|
03-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: remove PRC_ constants and streamline ICMP processing In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two very different kinds of events: internal system events and receival of an ICMP messages from outside. These events were coded with PRC_ codes. Today these methods are removed from the protosw(9) and are isolated to IPv4 and IPv6 stacks and are called only from icmp*_input(). The PRC_ codes now just create a shim layer between ICMP codes and errors or actions taken by protocols. - Change ipproto_ctlinput_t to pass just pointer to ICMP header. This allows protocols to not deduct it from the internal IP header. - Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer. It has all the information needed to the protocols. In the structure, change ip6c_finaldst fields to sockaddr_in6. The reason is that icmp6_input() already has this address wrapped in sockaddr, and the protocols want this address as sockaddr. - For UDP tunneling control input, as well as for IPSEC control input, change the prototypes to accept a transparent union of either ICMP header pointer or struct ip6ctlparam pointer. - In icmp_input() and icmp6_input() do only validation of ICMP header and count bad packets. The translation of ICMP codes to errors/actions is done by protocols. - Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap, inet6ctlerrmap arrays. - In protocol ctlinput methods either trust what icmp_errmap() recommend, or do our own logic based on the ICMP header. Differential revision: https://reviews.freebsd.org/D36731
|
#
c0fc81e9 |
|
03-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: remove dead code from TCP, UDP, SCTP control input Now these functions are called only from icmp*_input(). The pointer to the ICMP data is never NULL and cmd has a limited set of values. In the past the functions were demultiplexing control messages from ICMP layer, as well as internally generated events. In the latter case the the pointer to IP would be NULL. Differential revision: https://reviews.freebsd.org/D36729
|
#
7f3b00a8 |
|
03-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet: filter out invalid ICMP responses in ip_icmp() instead of doing that in every ipproto_ctlinput_t method. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36728
|
#
43d39ca7 |
|
03-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: de-void control input IP protocol methods After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed only and only from icmp6_input(). This allows to use protocol specific argument types in these methods instead of struct sockaddr and void. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36727
|
#
bb77f0c2 |
|
03-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp: typedef udp tunneling functions to functions, not pointers With this change one can make a forward declaration of a function that is of UDP tunneling type. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36724
|
#
e7d02be1 |
|
17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: refactor protosw and domain static declaration and load o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see 7b187005d18ef), and later reiterated in 2006 by rwatson@ (see 6fbb9cf860dcd). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ dff3237ee54ea). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232
|
#
78b1fc05 |
|
17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: separate pr_input and pr_ctlinput out of protosw The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols that are IPv4/IPv6 protocol. These two things do not make one-to-one correspondence. The pr_input and pr_ctlinput methods were utilized only in IP protocols. This strange duality required IP protocols that doesn't have a socket to declare protosw, e.g. carp(4). On the other hand developers of socket protocols thought that they need to define pr_input/pr_ctlinput always, which lead to strange dead code, e.g. div_input() or sdp_ctlinput(). With this change pr_input and pr_ctlinput as part of protosw disappear and IPv4/IPv6 get their private single level protocol switch table ip_protox[] and ip6_protox[] respectively, pointing at array of ipproto_input_t functions. The pr_ctlinput that was used for control input coming from the network (ICMP, ICMPv6) is now represented by ip_ctlprotox[] and ip6_ctlprotox[]. ipproto_register() becomes the only official way to register in the table. Those protocols that were always static and unlikely anybody is interested in making them loadable, are now registered by ip_init(), ip6_init(). An IP protocol that considers itself unloadable shall register itself within its own private SYSINIT(). Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36157
|
#
c93db4ab |
|
16-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp: call UDP methods from UDP over IPv6 directly Both UDP and UDP Lite use same methods on sockets. Both UDP over IPv4 and over IPv6 use same methods. Don't pretend that methods can switch and remove this unneeded complexity. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36154
|
#
c7a62c92 |
|
10-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: gather v4/v6 handling code into in_pcballoc() from protocols Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36062
|
#
b46667c6 |
|
17-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: merge two versions of sbcreatecontrol() into one No functional change.
|
#
742e7210 |
|
11-Apr-2022 |
Kristof Provost <kp@FreeBSD.org> |
udp: allow udp_tun_func_t() to indicate it did not eat the packet Allow udp tunnel functions to indicate they have not taken ownership of the packet, and that normal UDP processing should continue. This is especially useful for scenarios where the kernel has taken ownership of a socket that was originally created by userspace. It allows the tunnel function to pass through certain packets for userspace processing. The primary user of this is if_ovpn, when it receives messages from unknown peers (which might be a new client). Reviewed by: tuexen Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34883
|
#
995cba5a |
|
15-Feb-2022 |
Kristof Provost <kp@FreeBSD.org> |
netinet: allow UDP tunnels to be removed udp_set_kernel_tunneling() rejects new callbacks if one is already set. Allow callbacks to be cleared. The use case for this is OpenVPN DCO, where the socket is opened by userspace and then adopted by the kernel to run the tunnel. If the DCO interface is removed but userspace does not close the socket (something the kernel cannot prevent) the installed callbacks could be called with an invalidated context. Allow new functions to be set, but only if they're NULL (i.e. allow the callback functions to be cleared). Reviewed by: tuexen MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34288
|
#
fec8a8c7 |
|
03-Jan-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use global UMA zones for protocols Provide structure inpcbstorage, that holds zones and lock names for a protocol. Initialize it with global protocol init using macro INPCBSTORAGE_DEFINE(). Then, at VNET protocol init supply it as the main argument to the in_pcbinfo_init(). Each VNET pcbinfo uses its private hash, but they all use same zone to allocate and SMR section to synchronize. Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit on the socket zone, which was always global. Historically same maxsockets value is applied also to every PCB zone. Important fact: you can't create a pcb without a socket! A pcb may outlive its socket, however. Given that there are multiple protocols, and only one socket zone, the per pcb zone limits seem to have little value. Under very special conditions it may trigger a little bit earlier than socket zone limit, but in most setups the socket zone limit will be triggered earlier. When VIMAGE was added to the kernel PCB zones became per-VNET. This magnified existing disbalance further: now we have multiple pcb zones in multiple vnets limited to maxsockets, but every pcb requires a socket allocated from the global zone also limited by maxsockets. IMHO, this per pcb zone limit doesn't bring any value, so this patch drops it. If anybody explains value of this limit, it can be restored very easy - just 2 lines change to in_pcbstorage_init(). Differential revision: https://reviews.freebsd.org/D33542
|
#
89128ff3 |
|
03-Jan-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protocols: init with standard SYSINIT(9) or VNET_SYSINIT The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537
|
#
4760956e |
|
01-Jan-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
udp: use appropriate pcbinfo when signalling EHOSTDOWN MFC after: 3 days Sponsored by: Netflix, Inc.
|
#
014f98b1 |
|
16-Dec-2021 |
Mark Johnston <markj@FreeBSD.org> |
udp: Fix a use-after-free in udp_multi_input() "ip" is a pointer into the input mbuf chain, so we shouldn't access it after the chain is freed. Fix style at the call site while here. Reported by: syzbot+7c8258509722af1b6145@syzkaller.appspotmail.com Reviewed by: tuexen, glebius Fixes: de2d47842e88 ("SMR protection for inpcbs") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33473
|
#
d74b7bae |
|
04-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263
|
#
651a5451 |
|
02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp_detach(): fix set but not used warning
|
#
bd1d0850 |
|
02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp_multi_input(): the UDP header is only needed for probes Reported by: kib Fixes: de2d47842e88
|
#
db0ac6de |
|
02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
|
#
de2d4784 |
|
02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
SMR protection for inpcbs With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022
|
#
565655f4 |
|
02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: reduce some aliased functions after removal of PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33021
|
#
3ea9a7cf |
|
28-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
blackhole(4): disable for locally originated TCP/UDP packets In most cases blackholing for locally originated packets is undesired, leads to different kind of lags and delays. Provide sysctls to enforce it, e.g. for debugging purposes. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32718
|
#
3358df29 |
|
28-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
udp_input: remove a BSD stack relict I should had removed it 9 years ago in 8ad458a471ca. That commit left save_ip as a write-only variable. With save_ip removed we got one case when IP header can be modified: the calculation of IP checksum with zeroed out header. This place already has had a header saver char b[9]. However, the b[9] saver didn't cover the ip_sum field, which we explicitly overwrite aliased as (struct ipovly *)->ih_len. This was fine in cb34210012d4e, since checksum doesn't need to be restored if packet is consumed. Now we need to extend up to ip_sum field. In collaboration with: ae Differential revision: https://reviews.freebsd.org/D32719
|
#
3f1f6b6e |
|
05-Aug-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp, udp: improve input validation in handling bind() Reported by: syzbot+24fcfd8057e9bc339295@syzkaller.appspotmail.com Reported by: syzbot+6e90ceb5c89285b2655b@syzkaller.appspotmail.com Reviewed by: markj, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31422
|
#
a61c24dd |
|
01-Aug-2021 |
Konstantin Kukushkin <darkdestructor@rambler.ru> |
udp: Fix soroverflow SOCKBUF unlocking We hold the SOCKBUF_LOCK so use soroverflow_locked here. This bug may manifest as a non-killable process stuck in [*so_rcv]. Approved by: scottl Reviewed by: Roy Marples <roy@marples.name> Fixes: 7045b1603bdf MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D31374
|
#
7045b160 |
|
28-Jul-2021 |
Roy Marples <roy@marples.name> |
socket: Implement SO_RERROR SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports. Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652
|
#
f96603b5 |
|
31-May-2021 |
Mark Johnston <markj@FreeBSD.org> |
tcp, udp: Permit binding with AF_UNSPEC if the address is INADDR_ANY Prior to commit f161d294b we only checked the sockaddr length, but now we verify the address family as well. This breaks at least ttcp. Relax the check to avoid breaking compatibility too much: permit AF_UNSPEC if the address is INADDR_ANY. Fixes: f161d294b Reported by: Bakul Shah <bakul@iitbombay.org> Reviewed by: tuexen MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30539
|
#
d8acd268 |
|
12-May-2021 |
Mark Johnston <markj@FreeBSD.org> |
Fix mbuf leaks in various pru_send implementations The various protocol implementations are not very consistent about freeing mbufs in error paths. In general, all protocols must free both "m" and "control" upon an error, except if PRUS_NOTREADY is specified (this is only implemented by TCP and unix(4) and requires further work not handled in this diff), in which case "control" still must be freed. This diff plugs various leaks in the pru_send implementations. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30151
|
#
f161d294 |
|
02-May-2021 |
Mark Johnston <markj@FreeBSD.org> |
Add missing sockaddr length and family validation to various protocols Several protocol methods take a sockaddr as input. In some cases the sockaddr lengths were not being validated, or were validated after some out-of-bounds accesses could occur. Add requisite checking to various protocol entry points, and convert some existing checks to assertions where appropriate. Reported by: syzkaller+KASAN Reviewed by: tuexen, melifaro MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29519
|
#
c6ded47d |
|
11-Feb-2021 |
Andrey V. Elsukov <ae@FreeBSD.org> |
[udp] fix possible mbuf and lock leak in udp_input(). In error case we can leave `inp' locked, also we need to free mbuf chain `m' in the same case. Release the lock and use `badunlocked' label to exit with freed mbuf. Also modify UDP error statistic to match the IPv6 code. Remove redundant INP_RUNLOCK() from the `if (last == NULL)' block, there are no ways to reach this point with locked `inp'. Obtained from: Yandex LLC MFC after: 3 days Sponsored by: Yandex LLC
|
#
924d1c9a |
|
08-Feb-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly. This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.
|
#
4a01b854 |
|
07-Feb-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.
|
#
0c325f53 |
|
18-Oct-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523
|
#
374ce248 |
|
18-Sep-2020 |
Mitchell Horne <mhorne@FreeBSD.org> |
Initialize some local variables earlier Move the initialization of these variables to the beginning of their respective functions. On our end this creates a small amount of unneeded churn, as these variables are properly initialized before their first use in all cases. However, changing this benefits at least one downstream consumer (NetApp) by allowing local and future modifications to these functions to be made without worrying about where the initialization occurs. Reviewed by: melifaro, rscheff Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26454
|
#
662c1305 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
#
a9839c4a |
|
07-Aug-2020 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
IPV6_PKTINFO support for v4-mapped IPv6 sockets When using v4-mapped IPv6 sockets with IPV6_PKTINFO we do not respect the given v4-mapped src address on the IPv4 socket. Implement the needed functionality. This allows single-socket UDP applications (such as OpenVPN) to work better on FreeBSD. Requested by: Gert Doering (gert greenie.net), pfsense Tested by: Gert Doering (gert greenie.net) Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24135
|
#
983066f0 |
|
25-Apr-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340
|
#
7029da5c |
|
26-Feb-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
|
#
481be5de |
|
12-Feb-2020 |
Randall Stewart <rrs@FreeBSD.org> |
White space cleanup -- remove trailing tab's or spaces from any line. Sponsored by: Netflix Inc.
|
#
c1604fe4 |
|
21-Jan-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make in_pcbladdr() require network epoch entered by its callers. Together with this widen network epoch coverage up to tcp_connect() and udp_connect(). Revisions from r356974 and up to this revision cover D23187. Differential Revision: https://reviews.freebsd.org/D23187
|
#
334fc582 |
|
08-Jan-2020 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
vnet: virtualise more network stack sysctls. Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are set in the netoptions startup script, which we would love to run for VNETs as well [1]. While virtualising the log_in_vain sysctls seems pointles at first for as long as the kernel message buffer is not virtualised, it at least allows an administrator to debug the base system or an individual jail if needed without turning the logging on for all jails running on a system. PR: 243193 [1] MFC after: 2 weeks
|
#
032677ce |
|
07-Nov-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Now that there is no R/W lock on PCB list the pcblist sysctls handlers can be greatly simplified. All the previous double cycling and complex locking was added to avoid these functions holding global PCB locks for extended period of time, preventing addition of new entries.
|
#
80577e55 |
|
07-Nov-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in udp_input(). It shall always run in the network epoch.
|
#
2435e507 |
|
07-Nov-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Now with epoch synchronized PCB lookup tables we can greatly simplify locking in udp_output() and udp6_output(). First, we select if we need read or write lock in PCB itself, we take the lock and enter network epoch. Then, we proceed for the rest of the function. In case if we need to modify PCB hash, we would take write lock on it for a short piece of code. We could exit the epoch before allocating an mbuf, but with this patch we are keeping it all the way into ip_output()/ip6_output(). Today this creates an epoch recursion, since ip_output() enters epoch itself. However, once all protocols are reviewed, ip_output() and ip6_output() would require epoch instead of entering it. Note: I'm not 100% sure that in udp6_output() the epoch is required. We don't do PCB hash lookup for a bound socket. And all branches of in6_select_src() don't require epoch, at least they lack assertions. Today inet6 address list is protected by rmlock, although it is CKLIST. AFAIU, the future plan is to protect it by network epoch. That would require epoch in in6_select_src(). Anyway, in future ip6_output() would require epoch, udp6_output() would need to enter it.
|
#
d797164a |
|
07-Nov-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Since r353292 on input path we are always in network epoch, when we lookup PCBs. Thus, do not enter epoch recursively in in_pcblookup_hash() and in6_pcblookup_hash(). Same applies to tcp_ctlinput() and tcp6_ctlinput(). This leaves several sysctl(9) handlers that return PCB credentials unprotected. Add epoch enter/exit to all of them. Differential Revision: https://reviews.freebsd.org/D22197
|
#
503f4e47 |
|
07-Nov-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
netinet*: variable cleanup In preparation for another change factor out various variable cleanups. These mainly include: (1) do not assign values to variables during declaration: this makes the code more readable and does allow for better grouping of variable declarations, (2) do not assign values to variables before need; e.g., if a variable is only used in the 2nd half of a function and we have multiple return paths before that, then do not set it before it is needed, and (3) try to avoid assigning the same value multiple times. MFC after: 3 weeks Sponsored by: Netflix
|
#
eafaa1bc |
|
01-Jun-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
After parts of the locking fixes in r346595, syzkaller found another one in udp_output(). This one is a race condition. We do check on the laddr and lport without holding a lock in order to determine whether we want a read or a write lock (this is in the "sendto/sendmsg" cases where addr (sin) is given). Instrumenting the kernel showed that after taking the lock, we had bound to a local port from a parallel thread on the same socket. If we find that case, unlock, and retry again. Taking the write lock would not be a problem in first place (apart from killing some parallelism). However the retry is needed as later on based on similar condition checks we do acquire the pcbinfo lock and if the conditions have changed, we might find ourselves with a lock inconsistency, hence at the end of the function when trying to unlock, hitting the KASSERT. Reported by: syzbot+bdf4caa36f3ceeac198f@syzkaller.appspotmail.com Reviewed by: markj MFC after: 6 weeks Event: Waterloo Hackathon 2019
|
#
bd4549f4 |
|
21-May-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Massively blow up the locking-related KASSERTs used to make sure that we end up in a consistent locking state at the end of udp_output() in order to be able to see what the values are based on which we once took a decision (note: some values may have changed). This helped to debug a syzkaller report. MFC after: 2 months Event: Waterloo Hackathon 2019
|
#
f9a6e8d7 |
|
21-May-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Similarly to r338257,338306 try to fold the two consecutive #ifdef RSS section in udp_output() into one by moving a '}' outside of the conditional block. MFC after: 2 months Event: Waterloo Hackathon 2019
|
#
d86ecbe9 |
|
23-Apr-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
iFix udp_output() lock inconsistency. In r297225 the initial INP_RLOCK() was replaced by an early acquisition of an r- or w-lock depending on input variables possibly extending the write locked area for reasons not entirely clear but possibly to avoid a later case of unlock and relock leading to a possible race condition and possibly in order to allow the route cache to work for connected sockets. Unfortunately the conditions were not 1:1 replicated (probably because of the route cache needs). While this would not be a problem the legacy IP code compared to IPv6 has an extra case when dealing with IP_SENDSRCADDR. In a particular case we were holding an exclusive inp lock and acquired the shared udbinfo lock (now epoch). When then running into an error case, the locking assertions on release fired as the udpinfo and inp lock levels did not match. Break up the special case and in that particular case acquire and udpinfo lock depending on the exclusitivity of the inp lock. MFC After: 9 days Reported-by: syzbot+1f5c6800e4f99bdb1a48@syzkaller.appspotmail.com Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D19594
|
#
ade1258d |
|
22-Apr-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
r297225 move the assignment of sin from add to the top of the function. sin is not changed after the initial assignment, so no need to set it again. MFC after: 10 days
|
#
e9322998 |
|
22-Apr-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Remove some excessive brackets. No functional change. MFC after: 10 days
|
#
79db6fe7 |
|
22-Nov-2018 |
Mark Johnston <markj@FreeBSD.org> |
Plug some networking sysctl leaks. Various network protocol sysctl handlers were not zero-filling their output buffers and thus would export uninitialized stack memory to userland. Fix a number of such handlers. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: tuexen MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18301
|
#
4ba16a92 |
|
12-Oct-2018 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
In udp_input() when walking the pcblist we can come across an inp marked FREED after the epoch(9) changes. Check once we hold the lock and skip the inp if it is the case. Contrary to IPv6 the locking of the inp is outside the multicast section and hence a single check seems to suffice. PR: 232192 Reviewed by: mmacy, markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17540
|
#
3afdfcaf |
|
12-Oct-2018 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
r217592 moved the check for imo in udp_input() into the conditional block but leaving the variable assignment outside the block, where it is no longer used. Move both the variable and the assignment one block further in. This should result in no functional changes. It will however make upcoming changes slightly easier to apply. Reviewed by: markj, jtl, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17525
|
#
083a010c |
|
04-Oct-2018 |
Ryan Stone <rstone@FreeBSD.org> |
Hold a write lock across udp_notify() With the new route cache feature udp_notify() will modify the inp when it needs to invalidate the route cache. Ensure that we hold a write lock on the inp before calling the function to ensure that multiple threads don't race while trying to invalidate the cache (which previously lead to a page fault). Differential Revision: https://reviews.freebsd.org/D17246 Reviewed by: sbruno, bz, karels Sponsored by: Dell EMC Isilon Approved by: re (gjb)
|
#
7bda9663 |
|
31-Jul-2018 |
Michael Tuexen <tuexen@FreeBSD.org> |
Add a dtrace provider for UDP-Lite. The dtrace provider for UDP-Lite is modeled after the UDP provider. This fixes the bug that UDP-Lite packets were triggering the UDP provider. Thanks to dteske@ for providing the dwatch module. Reviewed by: dteske@, markj@, rrs@ Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16377
|
#
5f901c92 |
|
24-Jul-2018 |
Andrew Turner <andrew@FreeBSD.org> |
Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147
|
#
34fc9072 |
|
20-Jul-2018 |
Michael Tuexen <tuexen@FreeBSD.org> |
Set the IPv4 version in the IP header for UDP and UDPLite.
|
#
e1526d5a |
|
20-Jul-2018 |
Michael Tuexen <tuexen@FreeBSD.org> |
Add missing dtrace probes for received UDP packets. Fire UDP receive probes when a packet is received and there is no endpoint consuming it. Fire the probe also if the TTL of the received packet is smaller than the minimum required by the endpoint. Clarify also in the man page, when the probe fires. Reviewed by: dteske@, markj@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16046
|
#
6573d758 |
|
03-Jul-2018 |
Matt Macy <mmacy@FreeBSD.org> |
epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066
|
#
99208b82 |
|
01-Jul-2018 |
Matt Macy <mmacy@FreeBSD.org> |
inpcb: don't gratuitously defer frees Don't defer frees in sysctl handlers. It isn't necessary and it just confuses things. revert: r333911, r334104, and r334125 Requested by: jtl
|
#
46374cbf |
|
21-Jun-2018 |
Matt Macy <mmacy@FreeBSD.org> |
udp_ctlinput: don't refer to unpcb after we drop the lock Reported by: pho@
|
#
b872626d |
|
12-Jun-2018 |
Matt Macy <mmacy@FreeBSD.org> |
mechanical CK macro conversion of inpcbinfo lists This is a dependency for converting the inpcbinfo hash and info rlocks to epoch.
|
#
1a43cff9 |
|
06-Jun-2018 |
Sean Bruno <sbruno@FreeBSD.org> |
Load balance sockets with new SO_REUSEPORT_LB option. This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures: Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations: As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). This is a substantially different contribution as compared to its original incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@ for the substantive feedback that is included in this commit. Submitted by: Johannes Lundberg <johalun0@gmail.com> Obtained from: DragonflyBSD Relnotes: Yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003
|
#
fe524329 |
|
23-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
convert allocations to INVARIANTS M_ZERO
|
#
630ba2c5 |
|
23-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
udp: assign flowid to udp sockets round-robin On a 2x8x2 SKL this increases measured throughput with 64 netperf -H $DUT -t UDP_STREAM -- -m 1 from 590kpps to 1.1Mpps before: https://people.freebsd.org/~mmacy/2018.05.11/udpsender.svg after: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg
|
#
246a6199 |
|
23-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
epoch: allow for conditionally asserting that the epoch context fields are unused by zeroing on INVARIANTS builds
|
#
056b40e2 |
|
19-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
inpcb: consolidate possible deletion in pcblist functions in to epoch deferred context.
|
#
7875017c |
|
24-Apr-2018 |
Sean Bruno <sbruno@FreeBSD.org> |
Revert r332894 at the request of the submitter. Submitted by: Johannes Lundberg <johalun0_gmail.com> Sponsored by: Limelight Networks
|
#
7b7796ee |
|
23-Apr-2018 |
Sean Bruno <sbruno@FreeBSD.org> |
Load balance sockets with new SO_REUSEPORT_LB option This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). Submitted by: Johannes Lundberg <johanlun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
cc487c16 |
|
15-May-2017 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Reduce in_pcbinfo_init() by two params. No users supply any flags to this function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones.
|
#
c33a2313 |
|
14-Apr-2017 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Rework r316770 to make it protocol independent and general, like we do for streaming sockets. And do more cleanup in the sbappendaddr_locked_internal() to prevent leak information from existing mbuf to the one, that will be possible created later by netgraph. Suggested by: glebius Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week
|
#
84289149 |
|
13-Apr-2017 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Clear h/w csum flags on mbuf handled by UDP. When checksums of received IP and UDP header already checked, UDP uses sbappendaddr_locked() to pass received data to the socket. sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum offloading, mbuf contains csum_data and csum_flags that were calculated for already stripped headers. Some NICs support only limited checksums offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains some value that UDP/TCP should use for pseudo header checksum calculation. When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with filled csum_flags and csum_data, that were calculated for outer headers. When L2TP header is stripped, a packet that was tunneled goes to the IP layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and csum_data, the UDP/TCP checksum check fails for this packet. Reported by: Irina Liakh <spell at itl ua> Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week
|
#
4af540d1 |
|
05-Apr-2017 |
Ryan Stone <rstone@FreeBSD.org> |
Revert the optimization from r304436 r304436 attempted to optimize the handling of incoming UDP packet by only making an expensive call to in_broadcast() if the mbuf was marked as an broadcast packet. Unfortunately, this cannot work in the case of point-to- point L2 protocols like PPP, which have no notion of "broadcast". The optimization has been disabled for several months now with no progress towards fixing it, so it needs to go.
|
#
cc65eb4e |
|
21-Mar-2017 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Hide struct inpcb, struct tcpcb from the userland. This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018
|
#
dce33a45 |
|
05-Mar-2017 |
Ermal Luçi <eri@FreeBSD.org> |
The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Reviewed by: adrian, aw Approved by: ae (mentor) Sponsored by: rsync.net Differential Revision: D9235
|
#
fbbd9655 |
|
28-Feb-2017 |
Warner Losh <imp@FreeBSD.org> |
Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
|
#
c10c5b1e |
|
11-Feb-2017 |
Ermal Luçi <eri@FreeBSD.org> |
Committed without approval from mentor. Reported by: gnn
|
#
e97b6026 |
|
09-Feb-2017 |
Ermal Luçi <eri@FreeBSD.org> |
Fix build after r313524 Reported-by: ohartmann@walstatt.org
|
#
4616026f |
|
09-Feb-2017 |
Ermal Luçi <eri@FreeBSD.org> |
Revert r313527 Heh svn is not git
|
#
c0fadfdb |
|
09-Feb-2017 |
Ermal Luçi <eri@FreeBSD.org> |
Correct missed variable name. Reported-by: ohartmann@walstatt.org
|
#
ed55edce |
|
09-Feb-2017 |
Ermal Luçi <eri@FreeBSD.org> |
The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Sponsored-by: rsync.net Differential Revision: D9235 Reviewed-by: adrian
|
#
edf0313b |
|
07-Feb-2017 |
Eric van Gyzen <vangyzen@FreeBSD.org> |
Fix garbage IP addresses in UDP log_in_vain messages If multiple threads emit a UDP log_in_vain message concurrently, the IP addresses could be garbage due to concurrent usage of a single string buffer inside inet_ntoa(). Use inet_ntoa_r() with two stack buffers instead. Reported by: Mark Martinec <Mark.Martinec+freebsd@ijs.si> MFC after: 3 days Relnotes: yes Sponsored by: Dell EMC
|
#
fcf59617 |
|
06-Feb-2017 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352
|
#
00b460ff |
|
01-Oct-2016 |
Rick Macklem <rmacklem@FreeBSD.org> |
r297225 broke udp_output() for the case where the "addr" argument is NULL and the function jumps to the "release:" label. For this case, the "inp" was write locked, but the code attempted to read unlock it. This patch fixes the problem. This case could occur for NFS over UDP mounts, where the server was down for a few minutes under certain circumstances. Reported by: bde Tested by: bde Reviewed by: gnn MFC after: 2 weeks
|
#
c3bef61e |
|
15-Sep-2016 |
Kevin Lo <kevlo@FreeBSD.org> |
Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878
|
#
23424a20 |
|
22-Aug-2016 |
Ryan Stone <rstone@FreeBSD.org> |
Temporarily disable the optimization from r304436 r304436 attempted to optimize the handling of incoming UDP packet by only making an expensive call to in_broadcast() if the mbuf was marked as an broadcast packet. Unfortunately, this cannot work in the case of point-to- point L2 protocols like PPP, which have no notion of "broadcast". Discussions on how to properly fix r304436 are ongoing, but in the meantime disable the optimization to ensure that no existing network setups are broken. Reported by: bms
|
#
9da85a91 |
|
20-Aug-2016 |
Marko Zec <zec@FreeBSD.org> |
Permit disabling net.inet.udp.require_l2_bcast in VIMAGE kernels. The default value of the tunable introduced in r304436 couldn't be effectively overrided on VIMAGE kernels, because instead of being accessed via the appropriate VNET() accessor macro, it was accessed via the VNET_NAME() macro, which resolves to the (should-be) read-only master template of initial values of per-VNET data. Hence, while the value of udp_require_l2_bcast could be altered on per-VNET basis, the code in udp_input() would ignore it as it would always read the default value (one) from the VNET master template. Silence from: rstone
|
#
41029db1 |
|
18-Aug-2016 |
Ryan Stone <rstone@FreeBSD.org> |
Don't check for broadcast IPs on non-bcast pkts in_broadcast() can be quite expensive, so skip calling it if the incoming mbuf wasn't sent to a broadcast L2 address in the first place. Reviewed by: gnn MFC after: 2 months Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7309
|
#
4c105402 |
|
08-Jun-2016 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Cleanup unneded include "opt_ipfw.h". It was used for conditional build IPFIREWALL_FORWARD support. But IPFIREWALL_FORWARD option was removed a long time ago.
|
#
3f58662d |
|
01-Jun-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
The pr_destroy field does not allow us to run the teardown code in a specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652
|
#
abb901c5 |
|
28-Apr-2016 |
Randall Stewart <rrs@FreeBSD.org> |
Complete the UDP tunneling of ICMP msgs to those protocols interested in having tunneled UDP and finding out about the ICMP (tested by Michael Tuexen with SCTP.. soon to be using this feature). Differential Revision: http://reviews.freebsd.org/D5875
|
#
99d628d5 |
|
15-Apr-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
netinet: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle. Reviewed by: ae. tuexen
|
#
4f321dbd |
|
24-Mar-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Fix compile errors after r297225: - properly V_irtualise variable access unbreaking VIMAGE kernels. - remove the volatile from the function return type to make architecture using gcc happy [-Wreturn-type] "type qualifiers ignored on function return type" I am not entirely happy with this solution putting the u_int there but it will do for now.
|
#
84cc0778 |
|
24-Mar-2016 |
George V. Neville-Neil <gnn@FreeBSD.org> |
FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306
|
#
9ff1c463 |
|
22-Jan-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Correct function arguments for SYSUNINITs. Obtained from: p4 @180886 Sponsored by: The FreeBSD Foundation
|
#
1f12da0e |
|
22-Jan-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Just checkpoint the WIP in order to be able to make the tree update easier. Note: this is currently not in a usable state as certain teardown parts are not called and the DOMAIN rework is missing. More to come soon and find its way to head. Obtained from: P4 //depot/user/bz/vimage/... Sponsored by: The FreeBSD Foundation
|
#
e62b9bca |
|
24-Dec-2015 |
Sergey Kandaurov <pluknet@FreeBSD.org> |
Fixed comment placement. Before r12296, this comment described the udp_recvspace default value. Spotted by: ru Sponsored by: Nginx, Inc.
|
#
a86e5c96 |
|
27-Aug-2015 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
get_inpcbinfo() and get_pcblist() are UDP local functions and do not do what one would expect by name. Prefix them with "udp_" to at least obviously limit the scope. This is a non-functional change. Reviewed by: gnn, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3505
|
#
705f4d9c |
|
21-Jul-2015 |
Ermal Luçi <eri@FreeBSD.org> |
IPSEC, remove variable argument function its already due. Differential Revision: https://reviews.freebsd.org/D3080 Reviewed by: gnn, ae Approved by: gnn(mentor)
|
#
c0d1be08 |
|
21-Jul-2015 |
Randall Stewart <rrs@FreeBSD.org> |
When a tunneling protocol is being used with UDP we must release the lock on the INP before calling the tunnel protocol, else a LOR may occur (it does with SCTP for sure). Instead we must acquire a ref count and release the lock, taking care to allow for the case where the UDP socket has gone away and *not* unlocking since the refcnt decrement on the inp will do the unlock in that case. Reviewed by: tuexen MFC after: 3 weeks
|
#
b2bdc62a |
|
18-Jan-2015 |
Adrian Chadd <adrian@FreeBSD.org> |
Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)
|
#
44eb8bbe |
|
11-Dec-2014 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Do not count security policy violation twice. ipsec*_in_reject() do this by their own. Obtained from: Yandex LLC Sponsored by: Yandex LLC
|
#
a8da5dd6 |
|
05-Dec-2014 |
Craig Rodrigues <rodrigc@FreeBSD.org> |
MFp4: @181627 Allow UMA allocated memory to be freed when VNET jails are torn down. Differential Revision: D1201 Submitted by: bz Reviewed by: rwatson, gnn
|
#
c2529042 |
|
01-Dec-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies
|
#
6df8a710 |
|
07-Nov-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.
|
#
81d3ec17 |
|
10-Oct-2014 |
Bryan Venteicher <bryanv@FreeBSD.org> |
Add context pointer and source address to the UDP tunnel callback These are needed for the forthcoming vxlan implementation. The context pointer means we do not have to use a spare pointer field in the inpcb, and the source address is required to populate vxlan's forwarding table. While I highly doubt there is an out of tree consumer of the UDP tunneling callback, this change may be a difficult to eventually MFC. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn
|
#
a0a9e1b5 |
|
09-Oct-2014 |
Bryan Venteicher <bryanv@FreeBSD.org> |
Add missing UDP multicast receive dtrace probes Phabricator: https://reviews.freebsd.org/D924 Reviewed by: rpaulo markj MFC after: 1 month
|
#
c19f98eb |
|
08-Oct-2014 |
Bryan Venteicher <bryanv@FreeBSD.org> |
Check for mbuf copy failure when there are multiple multicast sockets This partitular case is the only path where the mbuf could be NULL. udp_append() checked for a NULL mbuf only after invoking the tunneling callback. Our only in tree tunneling callback - SCTP - assumed a non NULL mbuf, and it is a bit odd to make the callbacks responsible for checking this condition. This also reduces the differences between the IPv4 and IPv6 code. MFC after: 1 month
|
#
83e95fb3 |
|
30-Sep-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend that this means full checksum coverage for received packets. If an application is willing to accept packets with partial coverage, it is expected to use the socekt option and provice the minimum coverage it accepts. Reviewed by: kevlo MFC after: 3 days
|
#
c6d81a34 |
|
30-Sep-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
UDPLite requires a checksum. Therefore, discard a received packet if the checksum is 0. MFC after: 3 days
|
#
0f4a0366 |
|
30-Sep-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
If the checksum coverage field in the UDPLITE header is the length of the complete UDPLITE packet, the packet has full checksum coverage. SO fix the condition. Reviewed by: kevlo MFC after: 3 days
|
#
03f90784 |
|
28-Sep-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
Checksum coverage values larger than 65535 for UDPLite are invalid. Check for this when the user calls setsockopt using UDPLITE_{SEND,RECV}CSCOV. Reviewed by: kevlo MFC after: 3 days
|
#
8ad1a83b |
|
08-Sep-2014 |
Adrian Chadd <adrian@FreeBSD.org> |
Calculate the RSS hash for outbound UDPv4 frames. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan
|
#
9d3ddf43 |
|
08-Sep-2014 |
Adrian Chadd <adrian@FreeBSD.org> |
Add support for receiving and setting flowtype, flowid and RSS bucket information as part of recvmsg(). This is primarily used for debugging/verification of the various processing paths in the IP, PCB and driver layers. Unfortunately the current implementation of the control message path results in a ~10% or so drop in UDP frame throughput when it's used. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan
|
#
8f5a8818 |
|
07-Aug-2014 |
Kevin Lo <kevlo@FreeBSD.org> |
Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb
|
#
a485f139 |
|
12-May-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
Disable TX checksum offload for UDP-Lite completely. It wasn't used for partial checksum coverage, but even for full checksum coverage it doesn't work. This was discussed with Kevin Lo (kevlo@).
|
#
6c192602 |
|
10-May-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
Whitespace change.
|
#
d58c1533 |
|
09-May-2014 |
Michael Tuexen <tuexen@FreeBSD.org> |
Fix a logic bug which prevented the sending of UDP packet with 0 checksum. This bug was introduced in r264212 and should be X-MFCed with that revision, if UDP-Lite support if MFCed.
|
#
cfac59ec |
|
07-Apr-2014 |
Kevin Lo <kevlo@FreeBSD.org> |
Remove a bogus re-assignment.
|
#
d1b18731 |
|
06-Apr-2014 |
Kevin Lo <kevlo@FreeBSD.org> |
Minor style cleanups.
|
#
e06e816f |
|
06-Apr-2014 |
Kevin Lo <kevlo@FreeBSD.org> |
Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian
|
#
54366c0b |
|
25-Nov-2013 |
Attilio Rao <attilio@FreeBSD.org> |
- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
|
#
76039bc8 |
|
26-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
1ad19fb6 |
|
25-Aug-2013 |
Mark Johnston <markj@FreeBSD.org> |
The second last argument of udp:::receive is supposed to contain the connection state, not the IP header. X-MFC with: r254889
|
#
57f60867 |
|
25-Aug-2013 |
Mark Johnston <markj@FreeBSD.org> |
Implement the ip, tcp, and udp DTrace providers. The probe definitions use dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month
|
#
6794f460 |
|
23-Jul-2013 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Remove the large part of struct ipsecstat. Only few fields of this structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure. This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters. Reported by: Taku YAMAMOTO, Maciej Milewski
|
#
5b7cb97c |
|
09-Jul-2013 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU counters.
|
#
6659296c |
|
20-Jun-2013 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting. MFC after: 2 weeks
|
#
6acd596e |
|
07-Dec-2012 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
More warnings for zones that depend on the kern.ipc.maxsockets limit. Obtained from: WHEEL Systems
|
#
eb1b1807 |
|
05-Dec-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
|
#
ffdbf9da |
|
01-Nov-2012 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre
|
#
c1de64a4 |
|
25-Oct-2012 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks
|
#
8ad458a4 |
|
23-Oct-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.
|
#
8f134647 |
|
22-Oct-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
|
#
105bd211 |
|
12-Oct-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
In ip_stripoptions(): - Remove unused argument and incorrect comment. - Fixup ip_len after stripping.
|
#
f584d74b |
|
12-Jun-2012 |
Michael Tuexen <tuexen@FreeBSD.org> |
Add a cmsg of type IP_TOS for UDP/IPv4 sockets to specify the TOS byte. MFC after: 3 days
|
#
0cfdff24 |
|
25-May-2012 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFp4 bz_ipv6_fast: Properly protect the inp read access when handling the control code. In the past this was expensive but given the rlock it's not so much anymore. Spotted while: optimizing udp6 Discussed with: rwatson (a few months ago) Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days
|
#
40b676be |
|
27-Mar-2012 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Export the udp_cksum sysctl for upcoming SCTP work. Rather than always, SCTP will only do IPv4 UDP checksum calculation as defined by the host policy. When tunneling SCTP always calculates the inner checksum already so not doing the outer UDP can save cycles. While here virtualize the variable. Requested by: tuexen MFC after: 2 weeks
|
#
8a006adb |
|
20-Aug-2011 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)
|
#
52cd27cb |
|
05-Jun-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table. Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow). Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state. Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture. Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz). Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default. Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x. Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
d3c1f003 |
|
04-Jun-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes). Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently. (In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.) Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
fa046d87 |
|
30-May-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
79288c11 |
|
30-Apr-2011 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only as well compiling out most functions adding or extending #ifdef INET coverage. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
#
79bb84fb |
|
14-Apr-2011 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Refactor udp_input(), moving calls to u_tun_func() into udp_append(). Obtained from: Wheel Systems Sp. z o.o. Reviewed by: bz@
|
#
9d22191d |
|
12-Feb-2011 |
Daniel Eischen <deischen@FreeBSD.org> |
Oops, revert an accidental local change that got added in my last commit (r218627). No damage was done in the last commit, just some duplicated code was added (which is now removed).
|
#
f7e6ce6d |
|
12-Feb-2011 |
Daniel Eischen <deischen@FreeBSD.org> |
Allow the SO_SETFIB socket option to select the default (0) routing table. Reviewed by: julian
|
#
c3f9cbb0 |
|
19-Jan-2011 |
Randall Stewart <rrs@FreeBSD.org> |
Fix style 9 nit that snuck in when I grabbed the wrong patch ;-0 (thanks Daniel) MFC after: 1 week
|
#
a38b1c8c |
|
19-Jan-2011 |
Randall Stewart <rrs@FreeBSD.org> |
Fix a bug where Multicast packets sent from a udp endpoint may end up echoing back to the sender even with OUT joining the multi-cast group. Reviewed by: gnn, bms, bz? Obtained from: deischen (with help from)
|
#
79c3d51b |
|
18-Jan-2011 |
Matthew D Fleming <mdf@FreeBSD.org> |
Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE.
|
#
3e288e62 |
|
22-Nov-2010 |
Dimitry Andric <dim@FreeBSD.org> |
After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
|
#
31c6a003 |
|
14-Nov-2010 |
Dimitry Andric <dim@FreeBSD.org> |
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
c007b96a |
|
17-Aug-2010 |
John Baldwin <jhb@FreeBSD.org> |
Ensure a minimum "slop" of 10 extra pcb structures when providing a memory size estimate to userland for pcb list sysctls. The previous behavior of a "slop" of n/8 does not work well for small values of n (e.g. no slop at all if you have less than 8 open UDP connections). Reviewed by: bz MFC after: 1 week
|
#
4b332286 |
|
03-Jun-2010 |
Robert Watson <rwatson@FreeBSD.org> |
Merge r204826 from head to stable/8: Make udp_set_kernel_tunneling() less forgiving when its invariants are violated: so_pcb can never be NULL for a valid UDP socket, and it is always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does. Reviewed by: bz Sponsored by: Juniper Networks Approved by: re (kib)
|
#
480d7c6c |
|
06-May-2010 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFC r207369: MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH
|
#
82cea7e6 |
|
29-Apr-2010 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days
|
#
397069f2 |
|
27-Mar-2010 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFC r205251: Add pcb reference counting to the pcblist sysctl handler functions to ensure type stability while caching the pcb pointers for the copyout. Reviewed by: rwatson
|
#
3662f299 |
|
27-Mar-2010 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFC r204807: Destroy UDP UMA zones (empty or not) upon network stack teardown to not leak them making UMA/vmstat -z unhappy with every stoped vnet. We will still leak pages (especially as zones are marked NOFREE).
|
#
d0e157f6 |
|
17-Mar-2010 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Add pcb reference counting to the pcblist sysctl handler functions to ensure type stability while caching the pcb pointers for the copyout. Reviewed by: rwatson MFC after: 7 days
|
#
9bcd427b |
|
14-Mar-2010 |
Robert Watson <rwatson@FreeBSD.org> |
Abstract out initialization of most aspects of struct inpcbinfo from their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy() to do this work in a central spot. As inpcbinfo becomes more complex due to ongoing work to add connection groups, this will reduce code duplication. MFC after: 1 month Reviewed by: bz Sponsored by: Juniper Networks
|
#
68b5629b |
|
07-Mar-2010 |
Robert Watson <rwatson@FreeBSD.org> |
Make udp_set_kernel_tunneling() less forgiving when its invariants are violated: so_pcb can never be NULL for a valid UDP socket, and it is always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does. MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks
|
#
391dab1c |
|
06-Mar-2010 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Destroy UDP UMA zones (empty or not) upon network stack teardown to not leak them making the VM subsystem unhappy with every stoped vnet(*). We will still leak pages (especially as zones are marked NOFREE). (*) This will also keep vmstat -z more usable. Sponsored by: ISPsystem MFC after: 5 days
|
#
315e3e38 |
|
02-Aug-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)
|
#
530c0060 |
|
01-Aug-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
|
#
1e77c105 |
|
16-Jul-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
|
#
eddfbb76 |
|
14-Jul-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
#
7b495c44 |
|
12-Jun-2009 |
VANHULLEBUS Yvan <vanhu@FreeBSD.org> |
Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ
|
#
bc29160d |
|
08-Jun-2009 |
Marko Zec <zec@FreeBSD.org> |
Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)
|
#
bcf11e8d |
|
05-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
|
#
0304c731 |
|
27-May-2009 |
Jamie Gritton <jamie@FreeBSD.org> |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
|
#
6a9148fe |
|
23-May-2009 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Implement UDP control block support. So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel tunneling callbacks. Move that into the udpcb and add a field for flags there to be used by upcoming changes instead of sticking udp only flags into in_pcb flags2. Bump __FreeBSD_version for ports to detect it and because of vnet* struct size changes. Submitted by: jhb (7.x version) Reviewed by: rwatson
|
#
f6dfe47a |
|
30-Apr-2009 |
Marko Zec <zec@FreeBSD.org> |
Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_* macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_* family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)
|
#
093f25f8 |
|
26-Apr-2009 |
Marko Zec <zec@FreeBSD.org> |
In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)
|
#
026decb8 |
|
12-Apr-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days
|
#
86425c62 |
|
11-Apr-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Update stats in struct ipstat using four new macros, IPSTAT_ADD(), IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days
|
#
d10910e6 |
|
09-Mar-2009 |
Bruce M Simpson <bms@FreeBSD.org> |
Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD IPv4 stack. Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed. __FreeBSD_version is bumped to 800070.
|
#
b89e82dd |
|
05-Feb-2009 |
Jamie Gritton <jamie@FreeBSD.org> |
Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)
|
#
bbb0e3d9 |
|
06-Jan-2009 |
Randall Stewart <rrs@FreeBSD.org> |
Addresses Roberts comments on comments. Also adds the KASSERT and checks suggested. Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling"
|
#
c7c7ea4b |
|
05-Jan-2009 |
Randall Stewart <rrs@FreeBSD.org> |
Add the ability of an alternate transport protocol to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer.
|
#
385195c0 |
|
10-Dec-2008 |
Marko Zec <zec@FreeBSD.org> |
Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
4b79449e |
|
02-Dec-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
|
#
413628a7 |
|
29-Nov-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
|
#
97021c24 |
|
26-Nov-2008 |
Marko Zec <zec@FreeBSD.org> |
Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
44e33a07 |
|
19-Nov-2008 |
Marko Zec <zec@FreeBSD.org> |
Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
f08ef6c5 |
|
17-Oct-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then)
|
#
86d02c5c |
|
04-Oct-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred
|
#
8b615593 |
|
02-Oct-2008 |
Marko Zec <zec@FreeBSD.org> |
Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
2c0d658f |
|
24-Aug-2008 |
Julian Elischer <julian@FreeBSD.org> |
Another missed V_ instance
|
#
603724d3 |
|
17-Aug-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
|
#
48d48eb9 |
|
16-Aug-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Fix a regression introduced in r179289 splitting up ip6_savecontrol() into v4-only vs. v6-only inp_flags processing. When ip6_savecontrol_v4() is called from ip6_savecontrol() we were not passing back the **mp thus the information will be missing in userland. Istead of going with a *** as suggested in the PR we are returning **mp now and passing in the v4only flag as a pointer argument. PR: kern/126349 Reviewed by: rwatson, dwmalone
|
#
e2ed8f35 |
|
26-Jul-2008 |
Alexander Motin <mav@FreeBSD.org> |
Increase UDBHASHSIZE from 16 to 128 items. Previous value was chosen 10 years ago and not very effective now. This change gives several percents speedup on 1000 L2TP mpd links.
|
#
41698ebf |
|
20-Jul-2008 |
Tom Rhodes <trhodes@FreeBSD.org> |
Document a few sysctls. Reviewed by: rwatson
|
#
ca528788 |
|
16-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Fix error in comment. MFC after: 3 weeks
|
#
43cc0bc1 |
|
15-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Merge last of a series of rwlock conversion changes to UDP, which completes the move to a fully parallel UDP transmit path by using global read, rather than write, locking of inpcbinfo in further semi-connected cases: - Add macros to allow try-locking of inpcb and inpcbinfo. - Always acquire an incpcb read lock in udp_output(), which stablizes the local inpcb address and port bindings in order to determine what further locking is required: - If the inpcb is currently not bound (at all) and are implicitly connecting, we require inpcbinfo and inpcb write locks, so drop the read lock and re-acquire. - If the inpcb is bound for at least one of the port or address, but an explicit source or destination is requested, trylock the inpcbinfo lock, and if that fails, drop the inpcb lock, lock the global lock, and relock the inpcb lock. - Otherwise, no further locking is required (common case). - Update comments. In practice, this means that the vast majority of consumers of UDP sockets will not acquire any exclusive locks at the socket or UDP levels of the network stack. This leads to a marked performance improvement in several important workloads, including BIND, nsd, and memcached over UDP, as well as significant improvements in pps microbenchmarks. The plan is to MFC all of the rwlock changes to RELENG_7 once they have settled for a weeks in the tree. Tested by: ps, kris (older revision), bde MFC after: 3 weeks
|
#
3144b7d3 |
|
10-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Slightly rearrange validation of UDP arguments and jail processing in udp_output() so that argument validation occurs before jail processing. Add additional comments explaining what's going on when we process addresses and binding during udp_output(). MFC after: 3 weeks
|
#
1175d9d5 |
|
10-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Apply the MAC label to an outgoing UDP packet when other inpcb properties are processed, meaning that we avoid the cost of MAC label assignment if we're going to drop the packet due to mbuf exhaustion, etc. MFC after: 3 weeks
|
#
ac9ae279 |
|
06-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Allow udp_notify() to accept read, as well as write, locks on the passed inpcb. When directly invoking udp_notify() from udp_ctlinput(), acquire only a read lock; we may still see write locks in udp_notify() as the in_pcbnotifyall() routine is shared with TCP and always uses a write lock on the inpcb being notified. MFC after: 1 month
|
#
c4d585ae |
|
06-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Add additional udbinfo and inpcb locking assertions to udp_output(); for some code paths, global or inpcb write locks are required, but for other code paths, read locks or no locking at all are sufficient for the data structures. MFC after: 1 month
|
#
948d0fc9 |
|
07-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
First step towards parallel transmit in UDP: if neither a specific source or a specific destination address is requested as part of a send on a UDP socket, read lock the inpcb rather than write lock it. This will allow fully parallel transmit down to the IP layer when sending simultaneously from multiple threads on a connected UDP socket. Parallel transmit for more complex cases, such as when sendto(2) is invoked with an address and there's already a local binding, will follow. MFC after: 1 month
|
#
10cc62b7 |
|
07-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Drop read lock on udbinfo earlier during delivery to the last matching UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp_append(). MFC after: 1 month
|
#
5df3e839 |
|
02-Jul-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Add soreceive_dgram(9), an optimized socket receive function for use by datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers *significant* performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps
|
#
119d85f6 |
|
30-Jun-2008 |
Robert Watson <rwatson@FreeBSD.org> |
In udp_append() and udp_input(), make use of read locking on incpbs rather than write locking: while we need to maintain a valid reference to the inpcb and fix its state, no protocol layer state is modified during an IPv4 UDP receive -- there are only changes at the socket layer, which is separately protected by socket locking. While parallel concurrent receive on a single UDP socket is currently relatively unusual, introducing read locking in the transmit path, allowing concurrent receive and transmit, will significantly improve performance for loads such as BIND, memcached, etc. MFC after: 2 months Tested by: gnn, kris, ps
|
#
9622e84f |
|
29-May-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Employ read locks on UDP inpcbs, rather than write locks, when monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week
|
#
9a38ba81 |
|
24-May-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Factor out the v4-only vs. the v6-only inp_flags processing in ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days
|
#
8501a69c |
|
17-Apr-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)
|
#
30d239bc |
|
24-Oct-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
#
4b421e2d |
|
07-Oct-2007 |
Mike Silbersack <silby@FreeBSD.org> |
Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports. Approved by: re (kensmith)
|
#
f5514f08 |
|
10-Sep-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Further UDPv4 cleanup: - Resort includes a bit. - Correct typos and wording problems in comments. - Rename udpcksum to udp_cksum to be consistent with other UDP-related configuration variables. - Remove indirection of udp_notify through local notify variable in udp_ctlinput(), which is presumably due to copying and pasting from TCP, where multiple notify routines exist. Approved by: re (kensmith)
|
#
43bbb6aa |
|
10-Jul-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Further cleanup of UDPv4: - Move udp_sendspace and udp_recvspace global variables and associated sysctls to the top of the file where most other such things are present. - Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize so that we can add blackhole support for UDPv6 using the same MIB variable. - Move udp_append() above udp_input() to match the function order in udp6_usrreq.c. Approved by: re (kensmith)
|
#
bd84d204 |
|
07-Jul-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Minor UDPv4 cleanup: capitalize comment, move statistics update after mbuf free to be consistent with other error handling, and release socket buffer lock before freeing mbufs and statistics updates rather than after. Approved by: re (kensmith)
|
#
b2630c29 |
|
02-Jul-2007 |
George V. Neville-Neil <gnn@FreeBSD.org> |
Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing
|
#
2cb64cb2 |
|
01-Jul-2007 |
George V. Neville-Neil <gnn@FreeBSD.org> |
Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing
|
#
cce418d3 |
|
16-Jun-2007 |
Matt Jacob <mjacob@FreeBSD.org> |
Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0) codepath.
|
#
71498f30 |
|
12-Jun-2007 |
Bruce M Simpson <bms@FreeBSD.org> |
Import rewrite of IPv4 socket multicast layer to support source-specific and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work. This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project. The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING. This work was financially supported by another FreeBSD committer. Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)
|
#
32f9753c |
|
11-Jun-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
|
#
39629c92 |
|
16-May-2007 |
David Malone <dwmalone@FreeBSD.org> |
When verifying the IPv4 UDP checksum, don't overwrite the checksum value in the mbuf with the result of the calculation. Previously, if we chose to return an ICMP message, the quoted UDP checksum bytes would be different to what was sent. PR: 112471 Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz> MFC after: 3 weeks
|
#
54d642bb |
|
11-May-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.
|
#
6db851a2 |
|
07-May-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr() and in_setsockaddr(), containing only stale comments on why they exist, remove them and initialize the protosw for UDP to directly reference in_setpeeraddr() and in_setsockaddr().
|
#
af1ee11d |
|
07-May-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Minor style tweaks.
|
#
84ca8aa6 |
|
01-May-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().
|
#
712fc218 |
|
30-Apr-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.
|
#
1b7f0384 |
|
08-Mar-2007 |
Bruce M Simpson <bms@FreeBSD.org> |
Fix IP_SENDSRCADDR semantics. * To use this option with a UDP socket, it must be bound to a local port, and INADDR_ANY, to disallow possible collisions with existing udp inpcbs bound to the same port on other interfaces at send time. * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be rejected as it is ambiguous. * If the socket is bound to an address other than INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup(). Reviewed by: silence on -net Tested with: src/tools/regression/netinet/ipbroadcast MFC after: 4 days
|
#
afdb4274 |
|
20-Feb-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Rename two identically named log_in_vain variables: tcp_input.c's static log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week
|
#
3329b236 |
|
20-Feb-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Gratuitous UDP restyling toward style(9) in 7.x.
|
#
8b5b8850 |
|
06-Jan-2007 |
Maxim Konovalov <maxim@FreeBSD.org> |
o One more typo in the comment. PR: kern/107609 Submitted by: Dr. Markus Waldeck
|
#
6796a2d4 |
|
31-Dec-2006 |
Warner Losh <imp@FreeBSD.org> |
Fix typo in comment. Submitted by: remko
|
#
74eb3236 |
|
31-Dec-2006 |
Warner Losh <imp@FreeBSD.org> |
Add comment about udp checksums being off in BSD 4.2 compatibility mode. Submitted by: Dr. Markus Waldeck PR: kern/106657
|
#
08651e1f |
|
29-Dec-2006 |
John Baldwin <jhb@FreeBSD.org> |
Some whitespace nits and remove a few casts.
|
#
acd3428b |
|
06-Nov-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
aed55708 |
|
22-Oct-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
#
6fbfd582 |
|
06-Sep-2006 |
Andre Oppermann <andre@FreeBSD.org> |
Check inp_flags instead of inp_vflag for INP_ONESBCAST flag. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
#
d438d815 |
|
04-Sep-2006 |
Thomas Quinot <thomas@FreeBSD.org> |
Fix typo in comment.
|
#
a152f8a3 |
|
21-Jul-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn
|
#
d915b280 |
|
18-Jul-2006 |
Stephan Uphoff <ups@FreeBSD.org> |
Fix race conditions on enumerating pcb lists by moving the initialization ( and where appropriate the destruction) of the pcb mutex to the init/finit functions of the pcb zones. This allows locking of the pcb entries and race condition free comparison of the generation count. Rearrange locking a bit to avoid extra locking operation to update the generation count in in_pcballoc(). (in_pcballoc now returns the pcb locked) I am planning to convert pcb list handling from a type safe to a reference count model soon. ( As this allows really freeing the PCBs) Reviewed by: rwatson@, mohans@ MFC after: 1 week
|
#
f24618aa |
|
03-Jun-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Acquire udbinfo lock after call to soreserve() rather than before, as it is not required. This simplifies error-handling, and reduces the time that this lock is held. MFC after: 1 month
|
#
d45e4f99 |
|
21-May-2006 |
Maxim Konovalov <maxim@FreeBSD.org> |
o In udp|rip_disconnect() acquire a socket lock before the socket state modification. To prevent races do that while holding inpcb lock. Reviewed by: rwatson
|
#
59b8854e |
|
06-May-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Modify UDP to use sosend_dgram() instead of sosend(). This allows for signicantly optimized UDP socket I/O when using a single UDP socket from many threads or processes that share it, by avoiding significant locking and other overhead in the general sosend() path that isn't necessary for simple datagram sockets. Specifically, this change results in a significant performance improvement for threaded name service in BIND9 under load. Suggested by: Jinmei_Tatsuya at isc dot org
|
#
8e3f3b16 |
|
25-Apr-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Rename 'last' to 'inp' in udp_append(): the name 'last' is due to the fact that the loop through inpcb's in udp_input() tracks the last inpcb while looping. We keep that name in the calling loop but not in the delivery routine itself. MFC after: 3 months
|
#
4f590175 |
|
21-Apr-2006 |
Paul Saab <ps@FreeBSD.org> |
Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.
|
#
14ba8add |
|
01-Apr-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Update in_pcb-derived basic socket types following changes to pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket. MFC after: 3 months
|
#
bc725eaf |
|
01-Apr-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months
|
#
ac45e92f |
|
01-Apr-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months
|
#
0b4ae859 |
|
24-Jan-2006 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Implement 'ipfw fwd laddr,port' feature for UDP. According to ipfw(8) it should work, however it never did. People expect it to work. PR: kern/90834
|
#
e5bc0aa3 |
|
14-Jan-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Remove dead code: 'opts' is not used in udp_append(), only in udp_input(), so no need to assign it to NULL or conditionally free it. Found with: Coverity Prevent(tm) MFC after: 3 days
|
#
e59898ff |
|
14-Dec-2005 |
Maxime Henrion <mux@FreeBSD.org> |
Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to match the type of the variable they are exporting. Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days
|
#
ef39adf0 |
|
18-Nov-2005 |
Andre Oppermann <andre@FreeBSD.org> |
Consolidate all IP Options handling functions into ip_options.[ch] and include ip_options.h into all files making use of IP Options functions. From ip_input.c rev 1.306: ip_dooptions(struct mbuf *m, int pass) save_rte(m, option, dst) ip_srcroute(m0) ip_stripoptions(m, mopt) From ip_output.c rev 1.249: ip_insertoptions(m, opt, phlen) ip_optcopy(ip, jp) ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m) No functional changes in this commit. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
|
#
d46ff6bd |
|
12-Oct-2005 |
Maxim Konovalov <maxim@FreeBSD.org> |
o INP_ONESBCAST is inpcb.inp_vflag flag not inp_flags. The confusion with IP_PORTRANGE_HIGH leads to the incorrect checksum calculation. PR: kern/87306 Submitted by: Rickard Lind Reviewed by: bms MFC after: 2 weeks
|
#
b2828ad2 |
|
26-Sep-2005 |
Andre Oppermann <andre@FreeBSD.org> |
Implement IP_DONTFRAG IP socket option enabling the Don't Fragment flag on IP packets. Currently this option is only repected on udp and raw ip sockets. On tcp sockets the DF flag is controlled by the path MTU discovery option. Sending a packet larger than the MTU size of the egress interface returns an EMSGSIZE error. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
|
#
936cd18d |
|
22-Aug-2005 |
Andre Oppermann <andre@FreeBSD.org> |
Add socketoption IP_MINTTL. May be used to set the minimum acceptable TTL a packet must have when received on a socket. All packets with a lower TTL are silently dropped. Works on already connected/connecting and listening sockets for RAW/UDP/TCP. This option is only really useful when set to 255 preventing packets from outside the directly connected networks reaching local listeners on sockets. Allows userland implementation of 'The Generalized TTL Security Mechanism (GTSM)' according to RFC3682. Examples of such use include the Cisco IOS BGP implementation command "neighbor ttl-security". MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005
|
#
277afaff |
|
01-Jun-2005 |
Robert Watson <rwatson@FreeBSD.org> |
De-spl UDP. MFC after: 3 days
|
#
fd94099e |
|
05-May-2005 |
Colin Percival <cperciva@FreeBSD.org> |
If we are going to 1. Copy a NULL-terminated string into a fixed-length buffer, and 2. copyout that buffer to userland, we really ought to 0. Zero the entire buffer first. Security: FreeBSD-SA-05:08.kmem
|
#
812d8653 |
|
28-Mar-2005 |
Sam Leffler <sam@FreeBSD.org> |
eliminate extraneous null ptr checks Noticed by: Coverity Prevent analysis tool
|
#
3a1757b9 |
|
22-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
In in_pcbconnect_setup() jailed sockets are treated specially: if local address is not supplied, then jail IP is choosed and in_pcbbind() is called. Since udp_output() does not save local addr after call to in_pcbconnect_setup(), in_pcbbind() is called for each packet, and this is incorrect. So, we shall treat jailed sockets specially in udp_output(), we will save their local address. This fixes a long standing bug with broken sendto() system call in jails. PR: kern/26506 Reviewed by: rwatson MFC after: 2 weeks
|
#
c398230b |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for license, minor formatting changes
|
#
756d52a1 |
|
08-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Initialize struct pr_userreqs in new/sparse style and fill in common default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.
|
#
c83c1318 |
|
04-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Hide udp_in6 behind #ifdef INET6
|
#
d4b509bd |
|
03-Nov-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Until this change, the UDP input code used global variables udp_in, udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks
|
#
6b8e5a98 |
|
12-Oct-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Don't release the udbinfo lock until after the last use of UDP inpcb in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.
|
#
b5d47ff5 |
|
04-Sep-2004 |
John-Mark Gurney <jmg@FreeBSD.org> |
fix up socket/ip layer violation... don't assume/know that SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
|
#
392e8407 |
|
21-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
When sliding the m_data pointer forward, update m_pktrhdr.len as well as m_len, or the pkthdr length will be inconsistent with the actual length of data in the mbuf chain. The symptom of this occuring was "out of data" warnings from in_cksum_skip() on large UDP packets sent via the loopback interface. Foot shot: green
|
#
e6ccd709 |
|
21-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
When prepending space onto outgoing UDP datagram payloads to hold the UDP/IP header, make sure that space is also allocated for the link layer header. If an mbuf must be allocated to hold the UDP/IP header (very likely), then this will avoid an additional mbuf allocation at the link layer. This trick is also used by TCP and other protocols to avoid extra calls to the mbuf allocator in the ethernet (and related) output routines.
|
#
5c32ea65 |
|
18-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Push down pcbinfo and inpcb locking from udp_send() into udp_output(). This provides greater context for the locking and allows us to avoid locking the pcbinfo structure if not binding operations will take place (i.e., already bound, connected, and no expliti sendto() address).
|
#
a4f757cd |
|
16-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
White space cleanup for netinet before branch: - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
|
#
c19c5239 |
|
11-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
When udp_send() fails, make sure to free the control mbufs as well as the data mbuf. This was done in most error cases, but not the case where the inpcb pointer is surprisingly NULL.
|
#
420a2811 |
|
11-Aug-2004 |
Andre Oppermann <andre@FreeBSD.org> |
Backout removal of UMA_ZONE_NOFREE flag for all zones which are established for structures with timers in them. It might be that a timer might fire even when the associated structure has already been free'd. Having type- stable storage in this case is beneficial for graceful failure handling and debugging. Discussed with: bosko, tegge, rwatson
|
#
4efb805c |
|
11-Aug-2004 |
Andre Oppermann <andre@FreeBSD.org> |
Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and TCP code. This flag would have prevented giving back excessive free slabs to the global pool after a transient peak usage.
|
#
9c1df695 |
|
05-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
When iterating the UDP inpcb list processing an inbound broadcast or multicast packet, we don't need to acquire the inpcb mutex unless we are actually using inpcb fields other than the bound port and address. Since we hold the pcbinfo lock already, these can't change. Defer acquiring the inpcb mutex until we have a high chance of a match. This avoids about 120 mutex operations per UDP broadcast packet received on one of my work systems. Reviewed by: sam
|
#
56f21b9d |
|
26-Jul-2004 |
Colin Percival <cperciva@FreeBSD.org> |
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
|
#
1e4d7da7 |
|
26-Jun-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.
|
#
49b19bfc |
|
16-Jun-2004 |
Bruce M Simpson <bms@FreeBSD.org> |
Reverse a patch which has no effect on -CURRENT and should probably be applied directly to -STABLE. Noticed by: iedowse Pointy hat to: bms
|
#
34e3ccb3 |
|
15-Jun-2004 |
Bruce M Simpson <bms@FreeBSD.org> |
Disconnect a temporarily-connected UDP socket in out-of-mbufs case. This fixes the problem of UDP sockets getting wedged in a connected state (and bound to their destination) under heavy load. Temporary bind/connect should probably be deleted in future as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993]. Notes: - INP_LOCK() is already held in udp_output(). The connection is in effect happening at a layer lower than the socket layer, therefore in theory socket locking should not be needed. - Inlining the in_pcbdisconnect() operation buys us nothing (in the case of the current state of the code), as laddr is not part of the inpcb hash or the udbinfo hash. Therefore there should be no need to rehash after restoring laddr in the error case (this was a concern of the original author of the patch). PR: kern/41765 Requested by: gnn Submitted by: Jinmei Tatuya (with cleanups) Tested by: spray(8)
|
#
c18b97c6 |
|
03-May-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Switch to using the inpcb MAC label instead of socket MAC label when labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular: - Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond(). While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
|
#
87f2bb8c |
|
03-May-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Assert inpcb lock in udp_append(). Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
|
#
f36cfd49 |
|
07-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
|
#
b0330ed9 |
|
27-Mar-2004 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Reduce 'td' argument to 'cred' (struct ucred) argument in those functions: - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume
|
#
6823b823 |
|
27-Mar-2004 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Remove unused argument. Reviewed by: ume
|
#
47934cef |
|
25-Feb-2004 |
Don Lewis <truckman@FreeBSD.org> |
Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
|
#
da0f4099 |
|
17-Feb-2004 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
IPSEC and FAST_IPSEC have the same internal API now; so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
|
#
f073c60f |
|
03-Feb-2004 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
pass pcb rather than so. it is expected that per socket policy works again.
|
#
be8a62e8 |
|
31-Jan-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce the SO_BINTIME option which takes a high-resolution timestamp at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.
|
#
0ca2861f |
|
27-Jan-2004 |
Ruslan Ermilov <ru@FreeBSD.org> |
Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls.
|
#
5bd311a5 |
|
25-Nov-2003 |
Sam Leffler <sam@FreeBSD.org> |
Split the "inp" mutex class into separate classes for each of divert, raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)
|
#
97d8d152 |
|
20-Nov-2003 |
Andre Oppermann <andre@FreeBSD.org> |
Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
|
#
a557af22 |
|
17-Nov-2003 |
Robert Watson <rwatson@FreeBSD.org> |
Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
83453a06 |
|
12-Nov-2003 |
Bruce M Simpson <bms@FreeBSD.org> |
Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path. This switch toggles between strict multicast delivery, and traditional multicast delivery. The traditional (default) behaviour is to deliver multicast datagrams to all sockets which are members of that group, regardless of the network interface where the datagrams were received. The strict behaviour is to deliver multicast datagrams received on a particular interface only to sockets whose membership is bound to that interface. Note that as a matter of course, multicast consumers specifying INADDR_ANY for their interface get joined on the interface where the default route happens to be bound. This switch has no effect if the interface which the consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING. The original patch has been cleaned up somewhat from that submitted. It has been tested on a multihomed machine with multiple QuickTime RTP streams running over the local switch, which doesn't do IGMP snooping. PR: kern/58359 Submitted by: William A. Carrel Reviewed by: rwatson MFC after: 1 week
|
#
3c47a187 |
|
08-Nov-2003 |
Sam Leffler <sam@FreeBSD.org> |
assert inpcb is locked in udp_output Supported by: FreeBSD Foundation
|
#
11de19f4 |
|
28-Oct-2003 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
ip6_savecontrol() argument is redundant
|
#
8a538743 |
|
02-Sep-2003 |
Bruce M Simpson <bms@FreeBSD.org> |
PR: kern/56343 Reviewed by: tjr Approved by: jake (mentor)
|
#
8afa2304 |
|
20-Aug-2003 |
Bruce M Simpson <bms@FreeBSD.org> |
Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on specific interfaces. This is required by aodvd, and may in future help us in getting rid of the requirement for BPF from our import of isc-dhcp. Suggested by: fenestro Obtained from: BSD/OS Reviewed by: mini, sam Approved by: jake (mentor)
|
#
53b57cd1 |
|
19-Aug-2003 |
Sam Leffler <sam@FreeBSD.org> |
add missing unlock when in_pcballoc returns an error
|
#
a163d034 |
|
18-Feb-2003 |
Warner Losh <imp@FreeBSD.org> |
Back out M_* changes, per decision of the TRB. Approved by: trb
|
#
4b40c56c |
|
14-Feb-2003 |
Jeffrey Hsu <hsu@FreeBSD.org> |
Take advantage of pre-existing lock-free synchronization and type stable memory to avoid acquiring SMP locks during expensive copyout process.
|
#
44956c98 |
|
21-Jan-2003 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
032dcc76 |
|
20-Nov-2002 |
Luigi Rizzo <luigi@FreeBSD.org> |
Back out some style changes. They are not urgent, I will put them back in after 5.0 is out. Requested by: sam Approved by: re
|
#
20fab863 |
|
17-Nov-2002 |
Luigi Rizzo <luigi@FreeBSD.org> |
Minor documentation changes and indentation fix. Replace m_copy() with m_copypacket() where applicable. While at it, fix some function headers and remove 'register' from variable declarations.
|
#
c557ae16 |
|
21-Oct-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Implement a new IP_SENDSRCADDR ancillary message type that permits a server process bound to a wildcard UDP socket to select the IP address from which outgoing packets are sent on a per-datagram basis. When combined with IP_RECVDSTADDR, such a server process can guarantee to reply to an incoming request using the same source IP address as the destination IP address of the request, without having to open one socket per server IP address. Discussed on: -net Approved by: re
|
#
90162a4e |
|
21-Oct-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Remove the "temporary connection" hack in udp_output(). In order to send datagrams from an unconnected socket, we used to first block input, then connect the socket to the sendmsg/sendto destination, send the datagram, and finally disconnect the socket and unblock input. We now use in_pcbconnect_setup() to check if a connect() would have succeeded, but we never record the connection in the PCB (local anonymous port allocation is still recorded, though). The result from in_pcbconnect_setup() authorises the sending of the datagram and selects the local address and port to use, so we just construct the header and call ip_output(). Discussed on: -net Approved by: re
|
#
9b657230 |
|
15-Oct-2002 |
Sam Leffler <sam@FreeBSD.org> |
correct PCB locking in broadcast/multicast case that was exposed by change to use udp_append Reviewed by: hsu
|
#
b9234faf |
|
15-Oct-2002 |
Sam Leffler <sam@FreeBSD.org> |
Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
|
#
5d846453 |
|
15-Oct-2002 |
Sam Leffler <sam@FreeBSD.org> |
Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
|
#
365433d9 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Code formatting sync to trustedbsd_mac: don't perform an assignment in an if clause. PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after:
|
#
fb95b5d3 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Rename mac_check_socket_receive() to mac_check_socket_deliver() so that we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
1cbd978e |
|
04-Aug-2002 |
Luigi Rizzo <luigi@FreeBSD.org> |
bugfix: move check for udp_blackhole before the one for icmp_bandlim. MFC after: 3 days
|
#
bdb3fa18 |
|
01-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Introduce support for Mandatory Access Control and extensible kernel access control. Add MAC support for the UDP protocol. Invoke appropriate MAC entry points to label packets that are generated by local UDP sockets, and to authorize delivery of mbufs to local sockets both in the multicast/broadcast case and the unicast case. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
5c38b6db |
|
28-Jul-2002 |
Don Lewis <truckman@FreeBSD.org> |
Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.
|
#
2d20c83f |
|
12-Jul-2002 |
Don Lewis <truckman@FreeBSD.org> |
Back out the previous change, since it looks like locking udbinfo provides sufficient protection.
|
#
bb1dd7a4 |
|
12-Jul-2002 |
Don Lewis <truckman@FreeBSD.org> |
Lock inp while we're accessing it.
|
#
0e1eebb8 |
|
11-Jul-2002 |
Don Lewis <truckman@FreeBSD.org> |
Defer calling SYSCTL_OUT() until after the locks have been released.
|
#
2ded288c |
|
21-Jun-2002 |
Jeffrey Hsu <hsu@FreeBSD.org> |
Fix logic which resulted in missing a call to INP_UNLOCK(). Submitted by: jlemon, mux
|
#
3ce144ea |
|
14-Jun-2002 |
Jeffrey Hsu <hsu@FreeBSD.org> |
Notify functions can destroy the pcb, so they have to return an indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
|
#
61ffc0b1 |
|
12-Jun-2002 |
Jeffrey Hsu <hsu@FreeBSD.org> |
The UDP head was unlocked too early in one unicast case. Submitted by: bug reported by arr
|
#
f76fcf6d |
|
10-Jun-2002 |
Jeffrey Hsu <hsu@FreeBSD.org> |
Lock up inpcb. Submitted by: Jennifer Yang <yangjihui@yahoo.com>
|
#
4cc20ab1 |
|
31-May-2002 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu
|
#
243917fe |
|
19-May-2002 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
|
#
960ed29c |
|
29-Apr-2002 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Revert the change of #includes in sys/filedesc.h and sys/socketvar.h. Requested by: bde Since locking sigio_lock is usually followed by calling pgsigio(), move the declaration of sigio_lock and the definitions of SIGIO_*() to sys/signalvar.h. While I am here, sort include files alphabetically, where possible.
|
#
44731cab |
|
01-Apr-2002 |
John Baldwin <jhb@FreeBSD.org> |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
|
#
c1cd65ba |
|
24-Mar-2002 |
Bruce Evans <bde@FreeBSD.org> |
Fixed some style bugs in the removal of __P(()). Continuation lines were not outdented to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting.
|
#
29dc1288 |
|
22-Mar-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Merge from TrustedBSD MAC branch: Move the network code from using cr_cansee() to check whether a socket is visible to a requesting credential to using a new function, cr_canseesocket(), which accepts a subject credential and object socket. Implement cr_canseesocket() so that it does a prison check, a uid check, and add a comment where shortly a MAC hook will go. This will allow MAC policies to seperately instrument the visibility of sockets from the visibility of processes. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
69c2d429 |
|
19-Mar-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Switch vm_zone.h with uma.h. Change over to uma interfaces.
|
#
4d77a549 |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P.
|
#
a854ed98 |
|
27-Feb-2002 |
John Baldwin <jhb@FreeBSD.org> |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
#
76183f34 |
|
26-Feb-2002 |
Dima Dorfman <dd@FreeBSD.org> |
Introduce a version field to `struct xucred' in place of one of the spares (the size of the field was changed from u_short to u_int to reflect what it really ends up being). Accordingly, change users of xucred to set and check this field as appropriate. In the kernel, this is being done inside the new cru2x() routine which takes a `struct ucred' and fills out a `struct xucred' according to the former. This also has the pleasant sideaffect of removing some duplicate code. Reviewed by: rwatson
|
#
ce178806 |
|
07-Nov-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Replace reference to 'struct proc' with 'struct thread' in 'struct sysctl_req', which describes in-progress sysctl requests. This permits sysctl handlers to have access to the current thread, permitting work on implementing td->td_ucred, migration of suser() to using struct thread to derive the appropriate ucred, and allowing struct thread to be passed down to other code, such as network code where td is not currently available (and curproc is used). o Note: netncp and netsmb are not updated to reflect this change, as they are not currently KSE-adapted. Reviewed by: julian Obtained from: TrustedBSD Project
|
#
cb342100 |
|
21-Oct-2001 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
restore the data of the ip header when extended udp header and data checksum is calculated. this caused some trouble in the code which the ip header is not modified. for example, inbound policy lookup failed. Obtained from: KAME MFC after: 1 week
|
#
8a7d8cc6 |
|
09-Oct-2001 |
Robert Watson <rwatson@FreeBSD.org> |
- Combine kern.ps_showallprocs and kern.ipc.showallsockets into a single kern.security.seeotheruids_permitted, describes as: "Unprivileged processes may see subjects/objects with different real uid" NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is an API change. kern.ipc.showallsockets does not. - Check kern.security.seeotheruids_permitted in cr_cansee(). - Replace visibility calls to socheckuid() with cr_cansee() (retain the change to socheckuid() in ipfw, where it is used for rule-matching). - Remove prison_unpcb() and make use of cr_cansee() against the UNIX domain socket credential instead of comparing root vnodes for the UDS and the process. This allows multiple jails to share the same chroot() and not see each others UNIX domain sockets. - Remove unused socheckproc(). Now that cr_cansee() is used universally for socket visibility, a variety of policies are more consistently enforced, including uid-based restrictions and jail-based restrictions. This also better-supports the introduction of additional MAC models. Reviewed by: ps, billf Obtained from: TrustedBSD Project
|
#
4787fd37 |
|
05-Oct-2001 |
Paul Saab <ps@FreeBSD.org> |
Only allow users to see their own socket connections if kern.ipc.showallsockets is set to 0. Submitted by: billf (with modifications by me) Inspired by: Dave McKay (aka pm aka Packet Magnet) Reviewed by: peter MFC after: 2 weeks
|
#
94088977 |
|
20-Sep-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Rename u_cansee() to cr_cansee(), making the name more comprehensible in the face of a rename of ucred to cred, and possibly generally. Obtained from: TrustedBSD Project
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
f0ffb944 |
|
03-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
Patches from Keiichi SHIMA <keiichi@iij.ad.jp> to make ip use the standard protosw structure again. Obtained from: Well, KAME I guess.
|
#
13cf67f3 |
|
26-Jul-2001 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
move ipsec security policy allocation into in_pcballoc, before making pcbs available to the outside world. otherwise, we will see inpcb without ipsec security policy attached (-> panic() in ipsec.c). Obtained from: KAME MFC after: 3 days
|
#
7ce87f12 |
|
23-Jun-2001 |
David Malone <dwmalone@FreeBSD.org> |
Allow getcred sysctl to work in jailed root processes. Processes can only do getcred calls for sockets which were created in the same jail. This should allow the ident to work in a reasonable way within jails. PR: 28107 Approved by: des, rwatson
|
#
c73d99b5 |
|
23-Jun-2001 |
Ruslan Ermilov <ru@FreeBSD.org> |
Add netstat(1) knob to reset net.inet.{ip|icmp|tcp|udp|igmp}.stats. For example, ``netstat -s -p ip -z'' will show and reset IP stats. PR: bin/17338
|
#
33841545 |
|
10-Jun-2001 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
|
#
fb9aaba0 |
|
13-Mar-2001 |
Ruslan Ermilov <ru@FreeBSD.org> |
Count and show incoming UDP datagrams with no checksum.
|
#
c693a045 |
|
26-Feb-2001 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly. For TCP, verify that the sequence number in the ICMP packet falls within the tcp receive window before performing any actions indicated by the icmp packet. Clean up some layering violations (access to tcp internals from in_pcb)
|
#
d1c54148 |
|
22-Feb-2001 |
Jesper Skriver <jesper@FreeBSD.org> |
Redo the security update done in rev 1.54 of src/sys/netinet/tcp_subr.c and 1.84 of src/sys/netinet/udp_usrreq.c The changes broken down: - remove 0 as a wildcard for addresses and port numbers in src/sys/netinet/in_pcb.c:in_pcbnotify() - add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify all sessions with the specific remote address. - change - src/sys/netinet/udp_usrreq.c:udp_ctlinput() - src/sys/netinet/tcp_subr.c:tcp_ctlinput() to use in_pcbnotifyall() to notify multiple sessions, instead of using in_pcbnotify() with 0 as src address and as port numbers. - remove check for src port == 0 in - src/sys/netinet/tcp_subr.c:tcp_ctlinput() - src/sys/netinet/udp_usrreq.c:udp_ctlinput() as they are no longer needed. - move handling of redirects and host dead from in_pcbnotify() to udp_ctlinput() and tcp_ctlinput(), so they will call in_pcbnotifyall() to notify all sessions with the specific remote address. Approved by: jlemon Inspired by: NetBSD
|
#
91421ba2 |
|
20-Feb-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
|
#
58e9b417 |
|
20-Feb-2001 |
Jesper Skriver <jesper@FreeBSD.org> |
Only call in_pcbnotify if the src port number != 0, as we treat 0 as a wildcard in src/sys/in_pbc.c:in_pcbnotify() It's sufficient to check for src|local port, as we'll have no sessions with src|local port == 0 Without this a attacker sending ICMP messages, where the attached IP header (+ 8 bytes) has the address and port numbers == 0, would have the ICMP message applied to all sessions. PR: kern/25195 Submitted by: originally by jesper, reimplimented by jlemon's advice Reviewed by: jlemon Approved by: jlemon
|
#
c0511d3b |
|
18-Feb-2001 |
Brian Feldman <green@FreeBSD.org> |
Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde
|
#
a57815ef |
|
11-Feb-2001 |
Bosko Milekic <bmilekic@FreeBSD.org> |
Clean up RST ratelimiting. Previously, ratelimiting occured before tests were performed to determine if the received packet should be reset. This created erroneous ratelimiting and false alarms in some cases. The code has now been reorganized so that the checks for validity come before the call to badport_bandlim. Additionally, a few changes in the symbolic names of the bandlim types have been made, as well as a clarification of exactly which type each RST case falls under. Submitted by: Mike Silbersack <silby@silby.com>
|
#
fc2ffbe6 |
|
04-Feb-2001 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)
|
#
442fad67 |
|
24-Dec-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Update the "icmp_admin_prohib_like_rst" code to check the tcp-window and to be configurable with respect to acting only in SYN or in all TCP states. PR: 23665 Submitted by: Jesper Skriver <jesper@skriver.dk>
|
#
09f81a46 |
|
15-Dec-2000 |
Bosko Milekic <bmilekic@FreeBSD.org> |
Change the following: 1. ICMP ECHO and TSTAMP replies are now rate limited. 2. RSTs generated due to packets sent to open and unopen ports are now limited by seperate counters. 3. Each rate limiting queue now has its own description, as follows: Limiting icmp unreach response from 439 to 200 packets per second Limiting closed port RST response from 283 to 200 packets per second Limiting open port RST response from 18724 to 200 packets per second Limiting icmp ping response from 211 to 200 packets per second Limiting icmp tstamp response from 394 to 200 packets per second Submitted by: Mike Silbersack <silby@silby.com>
|
#
506f4949 |
|
01-Nov-2000 |
Ruslan Ermilov <ru@FreeBSD.org> |
Wrong checksum may have been computed for certain UDP packets. Reviewed by: jlemon
|
#
48cb400f |
|
31-Oct-2000 |
Ruslan Ermilov <ru@FreeBSD.org> |
Do not waste a time saving a copy of IP header if we are certainly not going to send an ICMP error message (net.inet.udp.blackhole=1).
|
#
46aa3347 |
|
27-Oct-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should *never* include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
|
#
24b261c7 |
|
17-Sep-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Reverse last commit, a better fix has been found.
|
#
e2cabba9 |
|
17-Sep-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Make sure UDP sockets are explicitly bind(2)'ed [sic] before we connect(2) them. PR: 20946 Isolated by: Aaron Gifford <agifford@infowest.com>
|
#
04287599 |
|
31-Aug-2000 |
Ruslan Ermilov <ru@FreeBSD.org> |
Fixed broken ICMP error generation, unified conversion of IP header fields between host and network byte order. The details: o icmp_error() now does not add IP header length. This fixes the problem when icmp_error() is called from ip_forward(). In this case the ip_len of the original IP datagram returned with ICMP error was wrong. o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host byte order, so DTRT and convert these fields back to network byte order before sending a message. This fixes the problem described in PR 16240 and PR 20877 (ip_id field was returned in host byte order). o ip_ttl decrement operation in ip_forward() was moved down to make sure that it does not corrupt the copy of original IP datagram passed later to icmp_error(). o A copy of original IP datagram in ip_forward() was made a read-write, independent copy. This fixes the problem I first reported to Garrett Wollman and Bill Fenner and later put in audit trail of PR 16240: ip_output() (not always) converts fields of original datagram to network byte order, but because copy (mcopy) and its original (m) most likely share the same mbuf cluster, ip_output()'s manipulations on original also corrupted the copy. o ip_output() now expects all three fields, ip_len, ip_off and (what is significant) ip_id in host byte order. It was a headache for years that ip_id was handled differently. The only compatibility issue here is the raw IP socket interface with IP_HDRINCL socket option set and a non-zero ip_id field, but ip.4 manual page was unclear on whether in this case ip_id field should be in host or network byte order.
|
#
2160daba |
|
30-Aug-2000 |
Ruslan Ermilov <ru@FreeBSD.org> |
Backout the hack in rev 1.71, I am working on a better patch that should cover almost all inconsistencies in ICMP error generation.
|
#
47399871 |
|
29-Aug-2000 |
Darren Reed <darrenr@FreeBSD.org> |
Apply appropriate patch. PR: 20877 Submitted by: Frank Volf (volf@oasis.IAEhv.nl)
|
#
686cdd19 |
|
04-Jul-2000 |
Jun-ichiro itojun Hagino <itojun@FreeBSD.org> |
sync with kame tree as of july00. tons of bug fixes/improvements. API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)
|
#
77978ab8 |
|
04-Jul-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Previous commit changing SYSCTL_HANDLER_ARGS violated KNF. Pointed out by: bde
|
#
82d9ae4e |
|
03-Jul-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Style police catches up with rev 1.26 of src/sys/sys/sysctl.h: Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources: -sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
|
#
582a7760 |
|
23-May-2000 |
Bruce Evans <bde@FreeBSD.org> |
Fixed some style bugs (mainly convoluted logic for blackhole processing).
|
#
4f14ee00 |
|
22-May-2000 |
Dan Moschuk <dan@FreeBSD.org> |
sysctl'ize ICMP_BANDLIM and ICMP_BANDLIM_SUPPRESS_OUTPUT. Suggested by: des/nbm
|
#
db4f9cc7 |
|
27-Mar-2000 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Add support for offloading IP/TCP/UDP checksums to NIC hardware which supports them.
|
#
6a800098 |
|
22-Dec-1999 |
Yoshinobu Inoue <shin@FreeBSD.org> |
IPSEC support in the kernel. pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
#
369dc8ce |
|
21-Dec-1999 |
Eivind Eklund <eivind@FreeBSD.org> |
Change incorrect NULLs to 0s
|
#
d25f3712 |
|
18-Dec-1999 |
Brian Feldman <green@FreeBSD.org> |
M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
|
#
79ea3cf1 |
|
12-Dec-1999 |
Yoshinobu Inoue <shin@FreeBSD.org> |
Always set INP_IPV4 flag for IPv4 pcb entries, because netstat needs it to print out protocol specific pcb info. A patch submitted by guido@gvr.org, and asmodai@wxs.nl also reported the problem. Thanks and sorry for your troubles. Submitted by: guido@gvr.org Reviewed by: shin
|
#
cfa1ca9d |
|
07-Dec-1999 |
Yoshinobu Inoue <shin@FreeBSD.org> |
udp IPv6 support, IPv6/IPv4 tunneling support in kernel, packet divert at kernel for IPv6/IPv4 translater daemon This includes queue related patch submitted by jburkhol@home.com. Submitted by: queue related patch from jburkhol@home.com Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
#
12b4fd06 |
|
17-Nov-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
The logic for blackhole processing does not free mbufs if the blackhole flag is set. PR: 14958 Submitted by: Larry Baird <lab@gta.com> Reviewed by: phk
|
#
76429de4 |
|
05-Nov-1999 |
Yoshinobu Inoue <shin@FreeBSD.org> |
KAME related header files additions and merges. (only those which don't affect c source files so much) Reviewed by: cvs-committers Obtained from: KAME project
|
#
2f9a2132 |
|
18-Sep-1999 |
Brian Feldman <green@FreeBSD.org> |
Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually. Make a sonewconn3() which takes an extra argument (proc) so new sockets created with sonewconn() from a user's system call get the correct credentials, not just the parent's credentials.
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
828b7f40 |
|
18-Aug-1999 |
Geoff Rehmet <csgr@FreeBSD.org> |
Fix breakage if blackhole=1 and tiflags & TH_SYN, plus style(9) fixes Submitted by: Jonathon Lemon
|
#
16f7f31f |
|
16-Aug-1999 |
Geoff Rehmet <csgr@FreeBSD.org> |
Add net.inet.tcp.blackhole and net.inet.udp.blackhole sysctl knobs. With these knobs on, refused connection attempts are dropped without sending a RST, or Port unreachable in the UDP case. In the TCP case, sending of RST is inhibited iff the incoming segment was a SYN. Docs and rc.conf settings to follow.
|
#
490d50b6 |
|
11-Jul-1999 |
Brian Feldman <green@FreeBSD.org> |
Two new sysctls: net.inet.tcp.getcred and net.inet.udp.getcred. These take a sockaddr_in[2] (local, then remote) and return a struct ucred. Example code for these is at: http://www.FreeBSD.org/~green/inetd_ident.patch http://www.FreeBSD.org/~green/freebsd4.c (for pidentd) Reviewed by: bde
|
#
7a2aab80 |
|
19-Jun-1999 |
Brian Feldman <green@FreeBSD.org> |
This is the much-awaited cleaned up version of IPFW [ug]id support. All relevant changes have been made (including ipfw.8).
|
#
3d177f46 |
|
03-May-1999 |
Bill Fumerola <billf@FreeBSD.org> |
Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
|
#
75c13541 |
|
28-Apr-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
|
#
51508de1 |
|
03-Dec-1998 |
Matthew Dillon <dillon@FreeBSD.org> |
Reviewed by: freebsd-current Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl. If option is specified in kernel config, icmplim defaults to 100 pps. Setting it to 0 will disable the feature. This feature limits ICMP error responses for packets sent to bad tcp or udp ports, which does a lot to help the machine handle network D.O.S. attacks. The kernel will report packet rates that exceed the limit at a rate of one kernel printf per second. There is one issue in regards to the 'tail end' of an attack... the kernel will not output the last report until some unrelated and valid icmp error packet is return at some point after the attack is over. This is a minor reporting issue only.
|
#
6effc713 |
|
24-Aug-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Re-implement tcp and ip fragment reassembly to not store pointers in the ip header which can't work on alpha since pointers are too big. Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
#
98271db4 |
|
15-May-1998 |
Garrett Wollman <wollman@FreeBSD.org> |
Convert socket structures to be type-stable and add a version number. Define a parameter which indicates the maximum number of sockets in a system, and use this to size the zone allocators used for sockets and for certain PCBs. Convert PF_LOCAL PCB structures to be type-stable and add a version number. Define an external format for infomation about socket structures and use it in several places. Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through sysctl(3) without blocking network interrupts for an unreasonable length of time. This probably still has some bugs and/or race conditions, but it seems to work well enough on my machines. It is now possible for `netstat' to get almost all of its information via the sysctl(3) interface rather than reading kmem (changes to follow).
|
#
8781d8e9 |
|
28-Mar-1998 |
Bruce Evans <bde@FreeBSD.org> |
Fixed style bugs (mostly) in previous commit.
|
#
3d4d47f3 |
|
24-Mar-1998 |
Garrett Wollman <wollman@FreeBSD.org> |
Use the zone allocator to allocate inpcbs and tcpcbs. Each protocol creates its own zone; this is used particularly by TCP which allocates both inpcb and tcpcb in a single allocation. (Some hackery ensures that the tcpcb is reasonably aligned.) Also keep track of the number of pcbs of each type allocated, and keep a generation count (instance version number) for future use.
|
#
c3229e05 |
|
27-Jan-1998 |
David Greenman <dg@FreeBSD.org> |
Improved connection establishment performance by doing local port lookups via a hashed port list. In the new scheme, in_pcblookup() goes away and is replaced by a new routine, in_pcblookup_local() for doing the local port check. Note that this implementation is space inefficient in that the PCB struct is now too large to fit into 128 bytes. I might deal with this in the future by using the new zone allocator, but I wanted these changes to be extensively tested in their current form first. Also: 1) Fixed off-by-one errors in the port lookup loops in in_pcbbind(). 2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash() to do the initialial hash insertion. 3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability. 4) Added a new routine, in_pcbremlists() to remove the PCB from the various hash lists. 5) Added/deleted comments where appropriate. 6) Removed unnecessary splnet() locking. In general, the PCB functions should be called at splnet()...there are unfortunately a few exceptions, however. 7) Reorganized a few structs for better cache line behavior. 8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in the future, however. These changes have been tested on wcarchive for more than a month. In tests done here, connection establishment overhead is reduced by more than 50 times, thus getting rid of one of the major networking scalability problems. Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult. WARNING: Anything that knows about inpcb and tcpcb structs will have to be recompiled; at the very least, this includes netstat(1).
|
#
694ad0a9 |
|
25-Jan-1998 |
Steve Price <steve@FreeBSD.org> |
Fix a couple of operator precedence bugs. PR: 5450 Submitted by: Sakari Jalovaara <sja@tekla.fi>
|
#
592071e8 |
|
19-Dec-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't use ANSI string concatenation to misformat a string.
|
#
55b211e3 |
|
28-Oct-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused #includes.
|
#
f8f6cbba |
|
13-Sep-1997 |
Peter Wemm <peter@FreeBSD.org> |
Update network code to use poll support.
|
#
57bf258e |
|
16-Aug-1997 |
Garrett Wollman <wollman@FreeBSD.org> |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
#
a29f300e |
|
27-Apr-1997 |
Garrett Wollman <wollman@FreeBSD.org> |
The long-awaited mega-massive-network-code- cleanup. Part I. This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in. 2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials. 3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed. 4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family. 5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem. As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
|
#
ca98b82c |
|
02-Apr-1997 |
David Greenman <dg@FreeBSD.org> |
Reorganize elements of the inpcb struct to take better advantage of cache lines. Removed the struct ip proto since only a couple of chars were actually being used in it. Changed the order of compares in the PCB hash lookup to take advantage of partial cache line fills (on PPro). Discussed-with: wollman
|
#
ddd79a97 |
|
03-Mar-1997 |
David Greenman <dg@FreeBSD.org> |
Improved performance of hash algorithm while (hopefully) not reducing the quality of the hash distribution. This does not fix a problem dealing with poor distribution when using lots of IP aliases and listening on the same port on every one of them...some other day perhaps; fixing that requires significant code changes. The use of xor was inspired by David S. Miller <davem@jenolan.rutgers.edu>
|
#
b110a8a2 |
|
24-Feb-1997 |
Garrett Wollman <wollman@FreeBSD.org> |
Fix #include order.
|
#
117bcae7 |
|
18-Feb-1997 |
Garrett Wollman <wollman@FreeBSD.org> |
Convert raw IP from mondo-switch-statement-from-Hell to pr_usrreqs. Collapse duplicates with udp_usrreq.c and tcp_usrreq.c (calling the generic routines in uipc_socket2.c and in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached socket now traps, rather than harmlessly returning an error; this should never happen. Allow the raw IP buffer sizes to be controlled via sysctl.
|
#
d0390e05 |
|
14-Feb-1997 |
Garrett Wollman <wollman@FreeBSD.org> |
Fix the mechanism for choosing wehether to save the slow-start threshold in the route. This allows us to remove the unconditional setting of the pipesize in the route, which should mean that SO_SNDBUF and SO_RCVBUF should actually work again. While we're at it: - Convert udp_usrreq from `mondo switch statement from Hell' to new-style. - Delete old TCP mondo switch statement from Hell, which had previously been diked out.
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
82c23eba |
|
10-Nov-1996 |
Bill Fenner <fenner@FreeBSD.org> |
Add the IP_RECVIF socket option, which supplies a packet's incoming interface using a sockaddr_dl. Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR) to work for multicast UDP and raw sockets as well. (They previously only worked for unicast UDP).
|
#
430d30d8 |
|
25-Oct-1996 |
Bill Fenner <fenner@FreeBSD.org> |
Don't allow reassembly to create packets bigger than IP_MAXPACKET, and count attempts to do so. Don't allow users to source packets bigger than IP_MAXPACKET. Make UDP length and ipovly's protocol length unsigned short. Reviewed by: wollman Submitted by: (partly by) kml@nas.nasa.gov (Kevin Lahey)
|
#
6d6a026b |
|
07-Oct-1996 |
David Greenman <dg@FreeBSD.org> |
Improved in_pcblookuphash() to support wildcarding, and changed relavent callers of it to take advantage of this. This reduces new connection request overhead in the face of a large number of PCBs in the system. Thanks to David Filo <filo@yahoo.com> for suggesting this and providing a sample implementation (which wasn't used, but showed that it could be done). Reviewed by: wollman
|
#
0453d3cb |
|
08-Jun-1996 |
Bruce Evans <bde@FreeBSD.org> |
Changed some memcpy()'s back to bcopy()'s. gcc only inlines memcpy()'s whose count is constant and didn't inline these. I want memcpy() in the kernel go away so that it's obvious that it doesn't need to be optimized. Now it is only used for one struct copy in si.c.
|
#
c611d82e |
|
05-Jun-1996 |
Garrett Wollman <wollman@FreeBSD.org> |
Instrument UDP PCB hashing to see how often the hash lookup is effective for incoming packets.
|
#
82dab6ce |
|
09-May-1996 |
Garrett Wollman <wollman@FreeBSD.org> |
Make it possible to return more than one piece of control information (PR #1178). Define a new SO_TIMESTAMP socket option for datagram sockets to return packet-arrival timestamps as control information (PR #1179). Submitted by: Louis Mamakos <loiue@TransSys.com>
|
#
df5c0b8a |
|
01-May-1996 |
Bill Fenner <fenner@FreeBSD.org> |
Back out my stupid braino; I was thinking strlen and not sizeof.
|
#
af00f800 |
|
01-May-1996 |
Bill Fenner <fenner@FreeBSD.org> |
Size temp var correctly; buf[4*sizeof "123"] is not long enough to store "192.252.119.189\0".
|
#
75cfc95f |
|
27-Apr-1996 |
Andrey A. Chernov <ache@FreeBSD.org> |
inet_ntoa buffer was evaluated twice in log_in_vain, fix it. Thanx to: jdp
|
#
d78a37ad |
|
09-Apr-1996 |
Paul Traina <pst@FreeBSD.org> |
Logging UDP and TCP connection attempts should not be enabled by default. It's trivial to create a denial of service attack on a box so enabled. These messages, if enabled at all, must be rate-limited. (!)
|
#
816a3d83 |
|
04-Apr-1996 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Log TCP syn packets for ports we don't listen on. Controlled by: sysctl net.inet.tcp.log_in_vain: 1 Log UDP syn packets for ports we don't listen on. Controlled by: sysctl net.inet.udp.log_in_vain: 1 Suggested by: Warren Toomey <wkt@cs.adfa.oz.au>
|
#
2ee45d7d |
|
11-Mar-1996 |
David Greenman <dg@FreeBSD.org> |
Move or add #include <queue.h> in preparation for upcoming struct socket changes.
|
#
b62d102c |
|
15-Dec-1995 |
Bruce Evans <bde@FreeBSD.org> |
Uniformized pr_ctlinput protosw functions. The third arg is now `void *' instead of caddr_t and it isn't optional (it never was). Most of the netipx (and netns) pr_ctlinput functions abuse the second arg instead of using the third arg but fixing this is beyond the scope of this round of changes.
|
#
f708ef1b |
|
14-Dec-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Another mega commit to staticize things.
|
#
2baeef32 |
|
06-Dec-1995 |
Bruce Evans <bde@FreeBSD.org> |
Removed unnecessary #includes of vm stuff. Most of them were once prerequisites for <sys/sysctl.h>. subr_prof.c: Also replaced #include of <sys/user.h> by #include of <sys/resourcevar.h>.
|
#
ff98689d |
|
16-Nov-1995 |
Bruce Evans <bde@FreeBSD.org> |
Fixed recent staticizations. Some protypes for static functions were left in headers and not staticized.
|
#
0312fbe9 |
|
14-Nov-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
New style sysctl & staticize alot of stuff.
|
#
6dfab5b1 |
|
22-Sep-1995 |
Garrett Wollman <wollman@FreeBSD.org> |
Merge 4.4-Lite-2: always check the UDP checksum if it is present, even if we are not generating checksums. (Save a test in the input path.)
|
#
efe4b0eb |
|
21-Sep-1995 |
Garrett Wollman <wollman@FreeBSD.org> |
Second try: get 4.4-Lite-2 into the source tree. The conflicts don't matter because none of our working source files are on the CSRG branch any more. Obtained from: 4.4BSD-Lite-2
|
#
7eb7a449 |
|
17-Aug-1995 |
Andras Olah <olah@FreeBSD.org> |
Add a sanity check for the UDP length field in order to prevent malformed UDP packets to panic the kernel. Reviewed by: davidg, wollman Obtained from: dab@berserkly.cray.com (David A. Borman) via end2end list
|
#
9b2e5354 |
|
30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
#
94a5d9b6 |
|
09-May-1995 |
David Greenman <dg@FreeBSD.org> |
Replaced some bcopy()'s with memcpy()'s so that gcc while inline/optimize.
|
#
0d7b7d3e |
|
03-May-1995 |
David Greenman <dg@FreeBSD.org> |
Changed in_pcblookuphash() to not automatically call in_pcblookup() if the lookup fails. Updated callers to deal with this. Call in_pcblookuphash instead of in_pcblookup() in in_pcbconnect; this improves performance of UDP output by about 17% in the standard case.
|
#
15bd2b43 |
|
08-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
Implemented PCB hashing. Includes new functions in_pcbinshash, in_pcbrehash, and in_pcblookuphash.
|
#
b5e8ce9f |
|
16-Mar-1995 |
Bruce Evans <bde@FreeBSD.org> |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
1c09f774 |
|
15-Feb-1995 |
Garrett Wollman <wollman@FreeBSD.org> |
spl back down in unusual out-of-memory condition in udp_output(). Obtained from: Stevens, vol. 2, exercise 23.4 (solution p. 1083)
|
#
9bb8795d |
|
15-Feb-1995 |
Garrett Wollman <wollman@FreeBSD.org> |
Don't add back in the IP header length to ip_len; icmp_error will do it for us. Obtained from: Stevens, vol. 2, p. 774
|
#
f2ea20e6 |
|
15-Feb-1995 |
Garrett Wollman <wollman@FreeBSD.org> |
Add lots of useful MIB variables and a few not-so-useful ones for completeness.
|
#
623ae52e |
|
02-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
GCC cleanup. Reviewed by: Submitted by: Obtained from:
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
26f9a767 |
|
25-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|