#
d9b1f6fb |
|
10-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: fix bug with socket buffer character counter underflow Cover case when an nb that we are now reading in full had been partially read by previous read(2) and now has positive offset. Throw couple assertions that helped to catch that earlier.
|
#
f75d7fac |
|
10-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: avoid putting empty mbufs on the socket queue When processing incoming Netlink messages in nl_process_nbuf() kernel always allocates a writer with a buffer to put generated reply to. However, certain messages aren't replied. That makes nlmsg_flush() to put an empty buffer to the socket. Avoid doing that because avoiding is much easier than dealing with empty buffers on the receiver side.
|
#
e6f4c314 |
|
10-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: improve edge case when reading out truncated last nlmsg in nb When there is not enough space for one full message we return it truncated. This enters special block of code that previously may leave empty buffer with offset == datalen in the queue. Avoid that, as dealing later with empty buffers causes more pain than just avoiding them. While here add missing msgrcv increment.
|
#
09fa78d4 |
|
09-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: fix regression with group writers Refactoring of argument list to nl_send_one() led to derefercing wrong union member. Rename nl_send_one() to a more generic name, isolate anew nl_send_one() as the callback only for the normal writer and provide correct argument to nl_send() from nl_send_group(). Fixes: ff5ad900d2a0793659241eee96be53e6053b5081
|
#
af9f4ac5 |
|
08-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: just return EOPNOTSUPP on shutdown(2) This matches what Linux does. Reviewed by: melifaro, tuexen Differential Revision: https://reviews.freebsd.org/D43366
|
#
025007f3 |
|
02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: remove stale comment Fixes: ff5ad900d2a0793659241eee96be53e6053b5081
|
#
ff5ad900 |
|
02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: refactor control data generation for recvmsg(2) Netlink should return a very simple control data on every recvmsg(2) syscall. This data is associated with a syscall, not with an nlmsg, neither with internal our internal representation (nl_bufs). There is no need to pre-allocate it in non-sleepable context and attach to nl_buf. Allocate right in the syscall with M_WAITOK. This also shaves lots of code and simplifies things. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42989
|
#
7e19c018 |
|
02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: improve nl_soreceive() The previous commit conservatively mimiced operation of soreceive_generic(). The new code does two things: - parses Netlink message headers and always returns at least one full nlmsg - hides nl_buf boundaries from the userland, copying out several at once More details can be found in the large comment block added. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42785
|
#
17083b94 |
|
02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: use protocol specific receive buffer Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's, same part of struct sockbuf that is used for send buffer already. This shaves a lot of code and a lot of extra processing. The pcb rids of the I/O queues as the socket buffer is exactly the queue. The message writer is simplified a lot, as we now always deal with linear buf. Notion of different buffer types goes away as way as different kinds of writers. The only things remaining are: a socket writer and a group writer. The impact on the network stack is that we no longer use mbufs, so a workaround from d18715475071 disappears. Note on message throttling. Now the taskqueue throttling mechanism needs to look at both socket buffers protected by their respective locks and on flags in the pcb that are protected by the pcb lock. There is definitely some room for optimization, but this changes tries to preserve as much as possible. Note on new nl_soreceive(). It emulates soreceive_generic(). It must undergo further optimization, see large comment put in there. Note on tests/sys/netlink/test_netlink_message_writer.py. This test boiled down almost to nothing with mbufs removed. However, I left it with minimal functionality (it basically checks that allocating N bytes we get N bytes) as it is one of not so many examples of ktest framework that allows to test KPIs with python. Note on Linux support. It got much simplier: Netlink message writer loses notion of Linux support lifetime, it is same regardless of process ABI. On socket write from Linux process we perform conversion immediately in nl_receive_message() and on an output conversion to Linux happens in in nl_send_one(). XXX: both conversions use M_NOWAIT allocation, which used to be the case before this change, too. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42524
|
#
660bd40a |
|
02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: use domain specific send buffer Instead of using generic socket code, create Netlink specific socket buffer. It is a simple TAILQ of writes that came from userland. This saves us one memory allocation that could fail and one memory copy. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42522
|
#
97958f5d |
|
26-Dec-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: simplify socket destruction Destroy the socket at the file descriptor close(2). There is no reason to linger for any longer, there are no external references. Remove pr_detach method as nothing left to do after pr_close. Remove pr_abort method as it shall never be executed for this type of socket. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42521
|
#
0fac350c |
|
30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on getpeername/getsockname Just like it was done for accept(2) in cfb1e92912b4, use same approach for two simplier syscalls that return socket addresses. Although, these two syscalls aren't performance critical, this change generalizes some code between 3 syscalls trimming code size. Following example of accept(2), provide VNET-aware and INVARIANT-checking wrappers sopeeraddr() and sosockaddr() around protosw methods. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D42694
|
#
ab393e95 |
|
12-Oct-2023 |
Kristof Provost <kp@FreeBSD.org> |
netlink: move NETLINK define to opt_global.h Move the NETLINK define into opt_global.h so we can rely on it being set correctly, without having to remember to include opt_netlink.h. This ensures that the NETLINK define is correctly set. If not we may end up with unloadable modules, due to missing symbols (such as nlmsg_get_group_writer). PR: 274306 Reviewed by: imp, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D42179
|
#
4d846d26 |
|
10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
|
#
fa554de7 |
|
11-May-2023 |
Kristof Provost <kp@FreeBSD.org> |
netlink: reduce default log levels Reduce the default log level for netlink to LOG_INFO. This removes a number of messages such as > [nl_iface] dump_sa: unsupported family: 0, skipping or > [nl_iface] get_operstate_ether: error calling SIOCGIFMEDIA on vlan0: 22 that are useful for debugging, but not for most users. Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D40062
|
#
bc8dc484 |
|
28-Apr-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add forgotten opt_netlink header
|
#
30d7e724 |
|
27-Apr-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
route: show originator PID in netlink monitor Replacing rtsock with netlink also means providing similar tracing facilities, rtsock provides `route -n monitor` interface, where each message can be traced to the originating PID. This diff closes the feature gap between rtsock and netlink in that regard. Netlink works slightly differently from rtsock, as it is a generic message "broker". It calls some kernel KPIs and returns the result to the caller. Other Netlink consumers gets notified on the changed kernel state using the relevant subsystem callbacks. Typically, it is close to impossible to pass some data through these KPIs to enhance the notification. This diff approaches the problem by using osd(9) to assign the relevant socket pointer (`'nlp`) to the per-socket taskqueue execution thread. This change allows to recover the pointer in the aforementioned notification callbacks and extract some additional data. Using `osd(9)` (and adding additional metadata) to the notification receiver comes with some additional cost attached, so this interface needs to be enabled explicitly by using a newly-created `NETLINK_MSG_INFO` `SOL_NETLINK` socket option. The actual medatadata (which includes the originator PID) is provided via control messages. To enable extensibility, the control message data is encoded in the standard netlink(TLV-based) fashion. The list of the currently-provided properties can be found in `nlmsginfo_attrs`. snl(3) is extended to enable decoding of netlink messages with metadata (`snl_read_message_dbg()` stores the parsed structure in the provided buffer). Differential Revision: https://reviews.freebsd.org/D39391
|
#
19e43c16 |
|
27-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink KPI to the kernel by default This change does the following: Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the kernel. Specifically, * Implementation of genetlink family/group registration/removal, some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in unconditionally. * Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are compiled in unconditionally. * Glue functions (netlink<>rtsock), malloc/core sysctl definitions (netlink_glue.c, 259 LoC) are compiled in unconditionally. * The rest of the KPI _functions_ are defined in the netlink_glue.c, but their implementation calls a pointer to either the stub function or the actual function, depending on whether the module is loaded or not. This approach allows to have only 1k LoC out of ~3.7k LoC (current sys/netlink implementation) in the kernel, which will not grow further. It also allows for the generic netlink kernel customers to load successfully without requiring Netlink module and operate correctly once Netlink module is loaded. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39269
|
#
04f75b98 |
|
26-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: allow netlink sockets in non-vnet jails. This change allow to open Netlink sockets in the non-vnet jails, even for unpriviledged processes. The security model largely follows the existing one. To be more specific: * by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command handler. * All notifications are **disabled** for non-vnet jails (requests to subscribe for the notifications are ignored). This will change to be more fine-grained model once the first netlink provider requiring this gets committed. * Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including** interfaces w/o any addresses attached to the jail). The value of this is questionable, but it follows the existing approach. * Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the current approach - currently we list static ARP/ND entries belonging to the addresses attached to the jail. * Listing interface addresses is **allowed**, but the addresses are filtered to match only ones attached to the jail. * Listing routes is **allowed**, but the routes are filtered to provide only host routes matching the addresses attached to the jail. * By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail (as sub-families may be unrelated to network at all). It is the goal of the family author to implement the restriction if necessary. Differential Revision: https://reviews.freebsd.org/D39206 MFC after: 1 month
|
#
046acc2b |
|
18-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add public ucred accessor for nlp. MFC after: 2 weeks
|
#
28a5d88f |
|
27-Feb-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: make the maximum allowed netlink socket buffer runtime tunable. Dumping large routng tables (>1M paths with multipath) require the socket buffer which is larger than the currently defined limit. Allow the limit to be set in runtime, similar to kern.ipc.maxsockbuf. Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> MFC after: 1 day
|
#
4404e840 |
|
18-Feb-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: initialise error in nl_autobind_port(). CID: 1498877 MFC after: 2 weeks
|
#
3f70fca9 |
|
18-Feb-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: check result of sooptcopyin(). CID: 1498866 MFC after: 2 weeks
|
#
0079d177 |
|
21-Jan-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: allow creating sockets with SOCK_DGRAM. Some existing applications setup Netlink socket with SOCK_DGRAM instead of SOCK_RAW. Update the manpage to clarify that the default way of creating the socket should be with SOCK_RAW. Update the code to support both SOCK_RAW and SOCK_DGRAM. Reviewed By: pauamma Differential Revision: https://reviews.freebsd.org/D38075
|
#
ab591c87 |
|
20-Dec-2022 |
Zhenlei Huang <zlei@FreeBSD.org> |
netlink: Use NET_EPOCH_[CALL|WAIT] macros Reviewed by: melifaro, kp Approved by: kp (mentor) Differential Revision: https://reviews.freebsd.org/D37730
|
#
4dfd380e |
|
03-Nov-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: allow more than 64 groups per netlink socket.
|
#
43d0c2dd |
|
27-Oct-2022 |
Ed Maste <emaste@FreeBSD.org> |
netlink: use (void) for function definitions with no arguments For some of these Clang produced a warning that "a function declaration without a prototype is deprecated in all versions of C". In other cases the function defintion used () which did not match the header declaration, which used (void). Sponsored by: The FreeBSD Foundation
|
#
fc083c3e |
|
01-Oct-2022 |
Jung-uk Kim <jkim@FreeBSD.org> |
netlink: Fix build without VIMAGE
|
#
8d9f3e05 |
|
01-Oct-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: fix format strings on 32-bit platforms
|
#
c90bff3f |
|
01-Oct-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: fix debugging on 32-bit platforms
|
#
7e5bf684 |
|
20-Jan-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink support Netlinks is a communication protocol currently used in Linux kernel to modify, read and subscribe for nearly all networking state. Interfaces, addresses, routes, firewall, fibs, vnets, etc are controlled via netlink. It is async, TLV-based protocol, providing 1-1 and 1-many communications. The current implementation supports the subset of NETLINK_ROUTE family. To be more specific, the following is supported: * Dumps: - routes - nexthops / nexthop groups - interfaces - interface addresses - neighbors (arp/ndp) * Notifications: - interface arrival/departure - interface address arrival/departure - route addition/deletion * Modifications: - adding/deleting routes - adding/deleting nexthops/nexthops groups - adding/deleting neghbors - adding/deleting interfaces (basic support only) * Rtsock interaction - route events are bridged both ways The implementation also supports the NETLINK_GENERIC family framework. Implementation notes: Netlink is implemented via loadable/unloadable kernel module, not touching many kernel parts. Each netlink socket uses dedicated taskqueue to support async operations that can sleep, such as interface creation. All message processing is performed within these taskqueues. Compatibility: Most of the Netlink data models specified above maps to FreeBSD concepts nicely. Unmodified ip(8) binary correctly works with interfaces, addresses, routes, nexthops and nexthop groups. Some software such as net/bird require header-only modifications to compile and work with FreeBSD netlink. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D36002 MFC after: 2 months
|