Cross Reference: /freebsd-current/sys/net/route.h

History log of /freebsd-current/sys/net/route.h
Revision	Date	Author	Comments
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 2ff63af9	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s\+\s\$FreeBSD\$.$\n/
# 90d62512	09-Mar-2023	Alexander V. Chernikov <melifaro@FreeBSD.org>	netlink: add rtsock-compatible header to use with netlink snl(3). Some routing socket defines (`RTM_` and `RTA_` ones) clash with the ones used by the the Netlink. As some rtsock definitions like interface flags or route flags are used in both netlink and rtsock, provide a convenient way to include those without running into the define collision. Differential Revision: https://reviews.freebsd.org/D38982 MFC after: 2 weeks
# 1bcd230f	03-Dec-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	netlink: add interface notification on link status / flags change. * Add link-state change notifications by subscribing to ifnet_link_event. In the Linux netlink model, link state is reported in 2 places: first is the IFLA_OPERSTATE, which stores state per RFC2863. The second is an IFF_LOWER_UP interface flag. As many applications rely on the latter, reserve 1 bit from if_flags, named as IFF_NETLINK_1. This flag is mapped to IFF_LOWER_UP in the netlink headers. This is done to avoid making applications think this flag is actually supported / presented in non-netlink outputs. * Add flag change notifications, by hooking into rt_ifmsg(). In the netlink model, notification should include the bitmask for the change flags. Update rt_ifmsg() to include such bitmask. Differential Revision: https://reviews.freebsd.org/D37597
# 1922eb3e	17-Aug-2022	Gleb Smirnoff <glebius@FreeBSD.org>	protosw: retire pr_slowtimo and pr_fasttimo They were useful many years ago, when the callwheel was not efficient, and the kernel tried to have as little callout entries scheduled as possible. Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36163
# 036f1bc6	14-Aug-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	routing: retire rib_lookup_info() This function was added in pre-epoch era ( 9a1b64d5a0224 ) to provide public rtentry access interface & hide rtentry internals. The implementation is based on the large on-stack copying and refcounting of the referenced objects (ifa/ifp). It has become obsolete after epoch & nexthop introduction. Convert the last remaining user and remove the function itself. Differential Revision: https://reviews.freebsd.org/D36197
# d8b42ddc	11-Aug-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	rtsock: subscribe to ifnet eventhandlers instead of direct calls. Stop treating rtsock as a "special" consumer and use already-provided ifaddr arrival/departure notifications. MFC after: 2 weeks Test Plan: ``` 21:05 [0] m@devel0 route -n monitor -> ifconfig vtnet0.2 create got message of size 24 on Tue Aug 9 21:05:44 2022 RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: arrival got message of size 168 on Tue Aug 9 21:05:54 2022 RTM_IFINFO: iface status change: len 168, if# 3, link: up, flags:<BROADCAST,RUNNING,SIMPLEX,MULTICAST> -> ifconfig vtnet0.2 destroy got message of size 24 on Tue Aug 9 21:05:54 2022 RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: departure ``` Reviewed By: glebius Differential Revision: https://reviews.freebsd.org/D36095 MFC after: 2 weeks
# 66230639	03-Aug-2022	Alexander V. Chernikov <melifaro@FreeBSD.org>	routing: split nexthop creation and rtentry creation. This change is required for the upcoming introduction of the next nexhop-based operations KPI, as it will create rtentry and nexthops at different stages of route table modification. Differential Revision: https://reviews.freebsd.org/D36072 MFC after: 2 weeks
# 23677398	02-Apr-2022	Gordon Bergling <gbe@FreeBSD.org>	net(3): Fix a typo in a source code comment - s/paramenters/parameters/ MFC after: 3 days
# 63f7f392	26-Dec-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	routing: Add unified level-based logging support for the routing subsystem. Summary: MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33664
# 62e1a437	22-Aug-2021	Zhenlei Huang <zlei.huang@gmail.com>	routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549). Implement kernel support for RFC 5549/8950. * Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. This behavior is controlled by the net.route.rib_route_ipv6_nexthop sysctl (on by default). * Always pass final destination in ro->ro_dst in ip_forward(). * Use ro->ro_dst to exract packet family inside if_output() routines. Consistently use RO_GET_FAMILY() macro to handle ro=NULL case. * Pass extracted family to nd6_resolve() to get the LLE with proper encap. It leverages recent lltable changes committed in c541bd368f86. Presence of the functionality can be checked using ipv4_rfc5549_support feature(3). Example usage: route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0 Differential Revision: https://reviews.freebsd.org/D30398 MFC after: 2 weeks
# 8e8f1cc9	23-Apr-2021	Mark Johnston <markj@FreeBSD.org>	Re-enable network ioctls in capability mode This reverts a portion of 274579831b61 ("capsicum: Limit socket operations in capability mode") as at least rtsol and dhcpcd rely on being able to configure network interfaces while in capability mode. Reported by: bapt, Greg V Sponsored by: The FreeBSD Foundation
# 27457983	07-Apr-2021	Mark Johnston <markj@FreeBSD.org>	capsicum: Limit socket operations in capability mode Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However, these facilities can be used to circumvent some of the restrictions that capability mode is supposed to enforce. Add capability mode checks to disallow network configuration ioctls and creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET internet sockets. Reviewed by: oshogbo Discussed with: emaste Reported by: manu Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29423
# b1d63265	08-Mar-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Flush remaining routes from the routing table during VNET shutdown. Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116
# 64d5c277	14-Feb-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove now-unused RTF_RNH_LOCKED route flag. MFC after: 1 week
# 81728a53	08-Jan-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186
# d68cf57b	07-Jan-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011
# 77df2c21	30-Nov-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Renumber NHR_* flags after NHR_IFAIF removal in r368127. Suggested by: rpokala
# b712e3e3	29-Nov-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Refactor fib4/fib6 functions. No functional changes. * Make lookup path of fib<4\|6>_lookup_debugnet() separate functions (fib<46>_lookup_rt()). These will be used in the control plane code requiring unlocked radix operations and actual prefix pointer. * Make lookup part of fib<4\|6>_check_urpf() separate functions. This change simplifies the switch to alternative lookup implementations, which helps algorithmic lookups introduction. * While here, use static initializers for IPv4/IPv6 keys Differential Revision: https://reviews.freebsd.org/D27405
# 7a6dc73c	28-Nov-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Cleanup nexthops request flags: * remove NHR_IFAIF as it was used by previous version of nexthop KPI * update NHR_REF description
# 7511a638	22-Nov-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219
# 0c325f53	18-Oct-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523
# fedeb08b	03-Oct-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449
# 2259a030	21-Sep-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497
# f5247a23	21-Aug-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Make net.fibs growable. Allow to dynamically grow the amount of fibs in each vnet. This change alters current behavior. Currently, if one defines ROUTETABLES > 1 in the kernel config, each vnet will be created with the number of fibs defined in the kernel config. After this commit vnets will be created with fibs=1. Dynamic net.fibs is not compatible with net.add_addr_allfibs. The plan is to deprecate the latter and make net.add_addr_allfibs=0 default behaviour. Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26062
# 6cbadc42	13-Aug-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Move rtzone handling code to net/route_ctl.c After moving the route control plane code from net/route.c, all rtzone users ended up being in net/route_ctl.c. Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well to have everything in a single place. While here, remove custom initializers from the zone. It was added originally to avoid setup/teardown of costy per-cpu couters. With these counters removed, the only remaining job was avoiding rte mutex setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex will soon be removed. With that in mind, there is no sense in keeping custom zone callbacks. Differential Revision: https://reviews.freebsd.org/D26051
# e1c05fd2	21-Jul-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546
# 72587123	19-Jul-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro
# 8cee15d9	19-Jul-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546
# da187ddb	01-Jun-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067
# e7403d02	01-Jun-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Revert r361704, it accidentally committed merged D25067 and D25070.
# 79674562	01-Jun-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067
# 2bbab0af	23-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Use epoch(9) for rtentries to simplify control plane operations. Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866
# d2233725	10-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove rtalloc1(_fib) KPI. Last user of rtalloc1() KPI has been eliminated in rS360631. As kernel is now fully switched to use new routing KPI defined in rS359823, remove old lookup functions. Differential Revision: https://reviews.freebsd.org/D24776
# 656442a7	08-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Embed dst sockaddr into rtentry and remove rte packet counter Currently each rtentry has dst&gateway allocated separately from another zone, bloating cache accesses. Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning, leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64). Fields needed for the datapath are destination sockaddr and rt_nhop. So far it doesn't look like there is other routable addressing protocol other than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes. With that in mind, embed dst into struct rtentry, making the first 24 bytes of rtentry within 128 bytes. That is enough to make IPv6 address within first 128 bytes. It is still pretty easy to add code for supporting separately-allocated dst, however it doesn't make a lot of sense in having such code without a use case. As rS359823 moved the gateway to the nexthop structure, the dst embedding change removes the need for any additional allocations done by rt_setgate(). Lastly, as a part of cleanup, remove counter(9) allocation code, as this field is not used in packet processing anymore. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24669
# 682b902d	07-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add rib_lookup() sockaddr lookup wrapper and make ifa_ifwithroute use it. Create rib_lookup() wrapper around per-af dataplane lookup functions. This will help in the cases of having control plane af-agnostic code. Switch ifa_ifwithroute() to use this function instead of rtalloc1(). Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24731
# fc88ecd3	28-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Move route-specific ddb commands to route/route_ddb.c Currently functionality resides in rtsock.c, which is a controlling interface, partially external to the routing subsystem. Additionally, DDB-supporting functionality is > 100SLOC, which deserves a separate file. Given that, move this functionality to a newly-created net/route/ subdir. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D24561
# fe6da727	28-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Move struct rtentry definition to nhop_var.h. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver features much faster. This is one of the last changes, effectively moving struct rtentry definition to a net/route_var.h header, internal to the routing subsystem. Differential Revision: https://reviews.freebsd.org/D24580
# 1b0051ba	28-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Eliminate now-unused parts of old routing KPI. r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569
# 57e70e44	25-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix userland build broken by r360292.
# 983066f0	25-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340
# 5cdc4844	16-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix userland build broken by r360014.
# 539642a2	16-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add nhop parameter to rti_filter callback. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. Additionally, with the followup multipath changes, single rtentry can point to multiple nexthops. With that in mind, convert rti_filter callback used when traversing the routing table to accept pair (rt, nhop) instead of nexthop. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24440
# dd4776f0	14-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Reorganise nd6 notification code to avoid direct rtentry field access. One of the goals of the new routing KPI defined in r359823 is to entirely hide `struct rtentry` from the consumers. Doing so will allow to improve routing subsystem internals and deliver features more easily. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification code to avoid accessing most of the rtentry fields. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24404
# a6663252	12-Apr-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232
# 34a5582c	22-Jan-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Bring back redirect route expiration. Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075
# ead85fe4	09-Jan-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add fibnum, family and vnet pointer to each rib head. Having metadata such as fibnum or vnet in the struct rib_head is handy as it eases building functionality in the routing space. This change is required to properly bring back route redirect support. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23047
# e02d3fe7	07-Jan-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix rtsock route message generation for interface addresses. Reviewed by: olivier MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22974
# c83dda36	31-Dec-2019	Alexander V. Chernikov <melifaro@FreeBSD.org>	Split gigantic rtsock route_output() into smaller functions. Amount of changes to the original code has been intentionally minimised to ease diffing. The changes are mostly mechanical, with the following exceptions: * lltable handler is now called directly based of RTF_LLINFO flag presense. * "report" logic for updating rtm in RTM_GET/RTM_DELETE has been simplified, fixing several potential use-after-free cases in rt_addrinfo. * llable asserts has been replaced with error-returning, preventing kernel crashes when lltable gw af family is invalid (root required). MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22864
# 185c3d2b	16-Dec-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Convert routing statistics to VNET_PCPUSTAT. Submitted by: ocochard Reviewed by: melifaro, glebius Differential Revision: https://reviews.freebsd.org/D22834
# fda45409	18-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Make rt_getifa_fib() static.
# 6ca363eb	08-May-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Existense of PCB route caching doesn't allow us to use new fast route lookup KPI in ip_output() like it is already used in ip_forward(). However, when there is no PCB provided we can use fast KPI, gaining performance advantage. Typical case when ip_output() is called without a PCB pointer is a sendto(2) on a not connected UDP socket. In practice DNS servers do this. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D19804
# d25f8522	26-Nov-2018	Mark Johnston <markj@FreeBSD.org>	Plug routing sysctl leaks. Various structures exported by sysctl_rtsock() contain padding fields which were not being zeroed. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: ae MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18333
# d4672c61	03-Sep-2018	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than duplicating the functionality of a macro after r322866 use the already existing one. No functional changes. Reviewed by: karels, ae Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17004
# fc21c53f	22-Jan-2018	Ryan Stone <rstone@FreeBSD.org>	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels
# dbeab32f	22-Jan-2018	Ryan Stone <rstone@FreeBSD.org>	Invalidate inpcb LLE cache if cached route is invalidated When the inpcb route cache is invalidated after a change to the routing tables, we need to invalidate the LLE cache as well. Previous to this change packets for the connection would continue to use the old L2 information from the old L3 gateway, and the packets for the connection would likely be blackholed. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13988 Reviewed by: karels
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# dcfa556b0	24-Aug-2017	Gleb Smirnoff <glebius@FreeBSD.org>	Garbage collect RT_NORTREF, which is no longer in use after FLOWTABLE removal.
# b83aa367	13-Jun-2017	Andrey V. Elsukov <ae@FreeBSD.org>	Resurrect RTF_RNH_LOCKED flag and restore ability to call rtalloc1_fib() with acquired RIB lock. This fixes a possible panic due to trying to acquire RIB rlock when it is already exclusive locked. PR: 215963, 215122 MFC after: 1 week Sponsored by: Yandex LLC
# 2f8c6c0a	16-Apr-2017	Patrick Kelsey <pkelsey@FreeBSD.org>	Fix userland tools that don't check the format of routing socket messages before accessing message fields that may not be present, removing dead/duplicate/misleading code along the way. Document the message format for each routing socket message in route.h. Fix a bug in usr.bin/netstat introduced in r287351 that resulted in pointer computation with essentially random 16-bit offsets and dereferencing of the results. Reviewed by: ae MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10330
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# 10addc1e	10-Feb-2017	Gleb Smirnoff <glebius@FreeBSD.org>	Last consumer of _WANT_RTENTRY gone.
# 9809e9dc	01-Aug-2016	Conrad Meyer <cem@FreeBSD.org>	rtentry: Initialize rt_mtx with MTX_NEW The "rtentry" zone does not use UMA_ZONE_ZINIT, so it is invalid to assume the mutex's memory will be zero. Without MTX_NEW, garbage backing memory may trigger the "re-initializing a mutex" assertion. PR: 200991 Submitted by: Chang-Hsien Tsai <luke.tw AT gmail.com>
# 80ae8d60	05-Jun-2016	Bjoern A. Zeeb <bz@FreeBSD.org>	Provide a public interface to rt_flushifroutes which takes the address family as an argument as well. This will be used to cleanup individual protocols during VNET teardown. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation
# 6d768226	02-Jun-2016	George V. Neville-Neil <gnn@FreeBSD.org>	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262
# 4f321dbd	24-Mar-2016	Bjoern A. Zeeb <bz@FreeBSD.org>	Fix compile errors after r297225: - properly V_irtualise variable access unbreaking VIMAGE kernels. - remove the volatile from the function return type to make architecture using gcc happy [-Wreturn-type] "type qualifiers ignored on function return type" I am not entirely happy with this solution putting the u_int there but it will do for now.
# 84cc0778	24-Mar-2016	George V. Neville-Neil <gnn@FreeBSD.org>	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306
# 61eee0e2	24-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.
# 10e0e235	14-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove now-unused wrappers for various routing functions.
# 0eb64f4e	13-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove RTF_RNH_LOCKED support from rtalloc1_fib(). Last caller using it was eliminated in r293471. Sponsored by: Yandex LLC
# e5f3746a	11-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Do not rewrite all ro_flags.
# 64e94934	09-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix userland build broken by r293470. Pointy hat to: melifaro
# 36402a68	09-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Finish r275196: do not dereference rtentry in if_output() routines. The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae
# ea8d1492	09-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove sys/eventhandler.h from net/route.h Reviewed by: ae
# f2b2e77a	08-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	(Temporarily) remove route_redirect_event eventhandler. Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np
# 9a1b64d5	04-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).
# 0d4df029	03-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu(). Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae
# 4fb3a820	30-Dec-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102
# 427c2f4e	16-Dec-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Provide additional lle data in IPv6 lltable dump used by ndp(8). Before the change, things like lle state were queried via SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump. This ioctl was added in 1999, probably to avoid touching rtsock code. This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the following way: expire (already) maps to rtm_rmx.rmx_expire isrouter -> rtm_flags & RTF_GATEWAY asked -> rtm_rmx.rmx_pksent state -> rtm_rmx.rmx_state (maps to rmx_weight via define) Reviewed by: ae
# 65ff3638	08-Dec-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Merge helper fib* functions used for basic lookups. Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC
# e8b0643e	29-Nov-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Add new rt_foreach_fib_walk_del() function for deleting route entries by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().
# f221bcaa	17-Oct-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove several compat functions from pre-fib era.
# 441f9243	04-Sep-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Constantify lookup key in ifa_ifwith* functions. Some places in our network stack already have const arguments (like if_output() routines and LLE functions). Code using ifa_ifwith (and similar functins) along with LLE/_output functions is currently bound to use tricks like __DECONST(). Provide a cleaner way by making sockaddr lookup key really constant. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3464
# 2caee4be	10-Aug-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian
# 4bdf0b6a	08-Aug-2015	Alexander V. Chernikov <melifaro@FreeBSD.org>	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.
# 5d14e4cd	29-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Provide rte_<get\|set> methods to access rtentry for external consumers.
# 1be1588a	29-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	* Make ifa_add_loopback_route() prepare gw before insertion. * Temporarily move ifa_switch_loopback_route() implementation to route.c
# 7f948f12	16-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125
# 206344ac	16-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Remove unused rt_endzero define. Remove rt_mtx from public rtentry version.
# 69d149ad	09-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Since we no longer return individual radix entries, it is not possible to do per-rte accounting. Remove rt_kpktsent.
# 033074c4	09-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.
# 55e5eda6	08-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Separate radix and routing: use different structures for route and for other customers. Introduce new 'struct rib_head' for routing purposes and make all routing api use it.
# 1398ffe5	08-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.
# 8c3cfe0b	04-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.
# b4e8f808	19-Oct-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Switch IPv4 output path to use new routing api. The goals of the new API is to provide consumers with minimal needed information, but as fast as possible. So we provide full nexthop info copied into alighed on-cache structure instead of rte/ia pointers, their refcounts and locks. This does not provide solution for protecting from egress ifp destruction, but does not make it any worse. Current changes: nhops: Add fib4_lookup_prepend() function which stores either full L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2 info exists and we have to take "slow" path. ip_output: Currently ip[ 46]_output consumers use 'struct route' for the following purposes: 1) double lookup avoidance(route caching) 2) plain route caching 3) get path MTU to be able to notify source. The former pattern is mostly used by various tunnels (gif, gre, stf). (Actually, gre is the only remaining, others were already converted. Their locking model did not scale good enogh to benefit from such caching, so we have (temporarily) removed it without any performance loss). Plain route caching used by SCTP is simply wrong and should be removed. Temporary break it for now just to be able to compile. Optimize path mtu reporting by providing it in new 'route_info' stucture. Minimize games with @ia locking/refcounting for route lookup: add special nhop[46]_extended structure to store more route attributes. Pointer to given structure can be passed to fib4_lookup_prepend() to indicate we want this info (we actually needs it for UDP and raw IP). ether_output: Provide light-weight ether_output2() call to deal with transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/ other L2 hooks before actually transmitting frame by if_transmit()). Add a hack based on new RT_NHOP ro_flag to distinguish which version should we call. Better way is probably to add a new "if_output_frame" driver callbacks. Next steps: * Convert ip_fastfwd part * Implement auto-growing array for per-radix nexthops * Implement LLE tracking for nexthop calculations to be able to immediately provide all necessary info in single route lookup for gateway routes * Switch radix locking scheme to runtime/cfg lock * Implement multipath support for rtsock * Implement "tracked nexthops" for tunnels (e.g. _proper_ nexthop caching) * Add IPv6 support for remaining parts (postponed not to interfere with user/ae/inet6 branch) * Consider adding "if_output_frame" driver call to ease logical frame pushing.
# 9f21b0b8	21-Sep-2014	Hiroki Sato <hrs@FreeBSD.org>	Fix build.
# ee0bd4b9	20-Sep-2014	Hiroki Sato <hrs@FreeBSD.org>	Make net.add_addr_allfibs vnet-local.
# b980262e	03-May-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().
# 0fb9298d	29-Apr-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Move rt_setmetrics() from rtsock.c to route.c. All rtsock-initiated rte creation/modification are now performed in route.c holding radix tree write lock. This reduces the need for per-rte mutex. Sponsored by: Yandex LLC MFC after: 1 month
# 36d55f0f	26-Apr-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Unify sa_equal() macro usage. MFC after: 2 weeks
# de51fbd6	19-Apr-2014	John-Mark Gurney <jmg@FreeBSD.org>	garbage collect something that hasn't been triggered in almost 5 years... the last consumer was removed a couple years ago...
# 66dcee72	15-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Garbage collect long time obsoleted (or never used) stuff from routing API.
# 256ea2ab	05-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	The route code used to mtx_destroy() a locked mutex before rtentry free. Now, after r262763 it started to return locked mutexes to UMA. To fix that, conditionally unlock the mutex in the destructor. Tested by: "Sergey V. Dyatko" <sergey.dyatko@gmail.com>
# 5274e55e	04-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Hide struct rtentry from userland.
# e3a7aa6f	04-Mar-2014	Gleb Smirnoff <glebius@FreeBSD.org>	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.
# d375edc9	09-Jan-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (). bird alias handling is already broken in BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde). Sponsored by: Yandex LLC MFC after: 4 weeks
# 4cbac30b	09-Jan-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Split rt_newaddrmsg_fib() into two different functions. Adding/deleting interface addresses involves access to 3 different subsystems, int different parts of code. Each call can fail, so reporting successful operation by rtsock in the middle of the process error-prone. Further split routing notification API and actual rtsock calls via creating public-available rt_addrmsg() / rt_routemsg() functions with "private" rtsock_* backend. MFC after: 2 weeks
# 7d9b6df1	08-Jan-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Constanly use RT_ALL_FIBS everywhere instead of -1. MFC after: 2 weeks
# f672f56f	24-Jun-2013	Qing Li <qingli@FreeBSD.org>	Due to the routing related networking kernel redesign work in FBSD 8.0, interface routes have been returened to the applications without the RTF_GATEWAY bit. This incompatibility has caused some issues with Zebra, Qugga and the like. This patch provides the RTF_GATEWAY flag bit in returned interface routes so to behave similarly to pre 8.0 systems. Reviewed by: hrs Verified by: mackn at opendns dot com
# 3034f43f	08-Mar-2013	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix long-standing issue with interface routes being unprotected: Use RTM_PINNED flag to mark route as immutable. Forbid deleting immutable routes without special rtrequest1_fib() flag. Adding interface address with prefix already in route table is handled by atomically deleting old prefix and adding interface one. Discussed with: andre, eri MFC after: 3 weeks
# bf984051	04-Jul-2012	Gleb Smirnoff <glebius@FreeBSD.org>	When ip_output()/ip6_output() is supplied a struct route *ro argument, it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)
# bfca216e	18-Mar-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Hide kernel option ROUTETABLES evaluations in the implementation rather than the header file. With this also move RT_MAXFIBS and RT_NUMFIBS into the implemantion to avoid further usage in other code. rt_numfibs is all that should be needed. This allows users to change the number of FIBs from 1..RT_MAXFIBS(16) dynamically using the tunable without the need to change the kernel config for the maximum anymore. This means that thet multi-FIB feature is now fully available with GENERIC kernels. The kernel option ROUTETABLES can still be used to set the default numbers of FIBs in absence of the tunable. Ok.ed by: julian, hrs, melifaro MFC after: 2 weeks
# a8498625	02-Feb-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Move a comment from rtinit1() to the top of the file where dealing with the (maximum) number of FIBs trying to clarify that evetually FIBs should probably attached to domain(9) specific storage. [1] Add a comment on a limitimation on the rt_add_addr_allfibs option. Use RT_DEFAULT_FIB instead of 0 where applicable. Add empty line to functions without local variables per style. Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES to indicate that it might go away in the future. No functional change. Discussed with: julian [1] (clarification on what the original one meant) Sponsored by: Cisco Systems, Inc.
# 556d81dd	03-Feb-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than putting magic 0s as FIB argument into the rt* calls, provide a macro RT_DEFAULT_FIB defined to 0 to more easily identify the cases tied to the default FIB. Sponsored by: Cisco Systems, Inc.
# 528737fd	28-Sep-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Pass the fibnum where we need filtering of the message on the rtsock allowing routing daemons to filter routing updates on an rtsock per FIB. Adjust raw_input() and split it into wrapper and a new function taking an optional callback argument even though we only have one consumer [1] to keep the hackish flags local to rtsock.c. PR: kern/134931 Submitted by: multiple (see PR) Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 3 days
# 1eeb6d97	20-Sep-2011	Kip Macy <kmacy@FreeBSD.org>	Make KBI changes required for future MFCing of inpcb rtentry / llentry caching. Reviewed by: rwatson, bz Approved by: re (kib)
# f5857e2d	21-Jun-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Garbage collect never used global, sysctl, externs. MFC after: 1 week
# 2093339e	20-Mar-2011	Dmitry Chagin <dchagin@FreeBSD.org>	Remove dead code. MFC after: 1 Week
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 94190b39	01-Apr-2010	Qing Li <qingli@FreeBSD.org>	MFC 205222 Verify interface up status using its link state only if the interface has such capability. The interface capability flag indicates whether such capability exists. This approach is much more backward compatible. Physical device driver changes will be part of another commit. Also updated the ifconfig utility to show the LINKSTATE capability if present. Reviewed by: rwatson, imp, juli
# 243785f9	01-Apr-2010	Qing Li <qingli@FreeBSD.org>	MFC 205024 The if_tap interface is of IFT_ETHERNET type, but it does not set or update the if_link_state variable. As such RT_LINK_IS_UP() fails for the if_tap interface. Also, the RT_LINK_IS_UP() needs to bypass all loopback interfaces because loopback interfaces are considered up logically as long as the system is running. This patch fixes the above issues by setting and updating the if_link_state variable when the tap interface is opened or closed respectively. Similary approach is already done in the if_tun device.
# c951da56	01-Apr-2010	Qing Li <qingli@FreeBSD.org>	MFC 204902 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed.
# 6b533b5d	16-Mar-2010	Qing Li <qingli@FreeBSD.org>	Verify interface up status using its link state only if the interface has such capability. The interface capability flag indicates whether such capability exists. This approach is much more backward compatible. Physical device driver changes will be part of another commit. Also updated the ifconfig utility to show the LINKSTATE capability if present. Reviewed by: rwatson, imp, juli MFC after: 3 days
# 355ad3ea	11-Mar-2010	Qing Li <qingli@FreeBSD.org>	The if_tap interface is of IFT_ETHERNET type, but it does not set or update the if_link_state variable. As such RT_LINK_IS_UP() fails for the if_tap interface. Also, the RT_LINK_IS_UP() needs to bypass all loopback interfaces because loopback interfaces are considered up logically as long as the system is running. This patch fixes the above issues by setting and updating the if_link_state variable when the tap interface is opened or closed respectively. Similary approach is already done in the if_tun device. MFC after: 3 days
# c7ea0aa6	08-Mar-2010	Qing Li <qingli@FreeBSD.org>	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days
# 32c53401	05-Jan-2010	Qing Li <qingli@FreeBSD.org>	MFC r201282, r201543 r201282 ------- The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. r201543 ------- The IFA_RTSELF address flag marks a loopback route has been installed for the interface address. This marker is necessary to properly support PPP types of links where multiple links can have the same local end IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which was combined into the route flag bits during prefix installation in IPv6. This inclusion causing the prefix route to be unusable. This patch fixes this bug by excluding the IFA_RTSELF flag during route installation. PR: ports/141342, kern/141134
# c7ab6602	30-Dec-2009	Qing Li <qingli@FreeBSD.org>	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days
# 9a311445	08-Sep-2009	Navdeep Parhar <np@FreeBSD.org>	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month
# f987f193	22-Jun-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Collect all VIMAGE_GLOBALS variables in one place. No longer export rt_tables as all lookups go through rt_tables_get_rnh(). We cannot make rt_tables (and rtstat, rttrash[1]) static as netstat -r (-rs[1]) would stop working on a stripped VIMAGE_GLOBALS kernel. Reviewed by: zec Presumably broken by: phk 13.5y ago in r12820 [1]
# c2c2a7c1	01-Jun-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back. Update netstat to get the correct pointer using kvm_read() as well. This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now. Reviewed by: julian, rwatson, zec X-MFC: not possible
# 279aa3d4	16-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson
# 8f22a721	15-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	revert RTM_VERSION change - it doesn't do what I thought it does and changing breaks ifconfig needlessly
# de4ab55e	15-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	add an llentry to struct route{_in6} to allow it to be passed around with the rtentry
# 427ac07f	14-Apr-2009	Kip Macy <kmacy@FreeBSD.org>	Extend route command: - add show as alias for get - add weights to allow mpath to do more than equal cost - add sticky / nostick to disable / re-enable per-connection load balancing This adds a field to rt_metrics_lite so network bits of world will need to be re-built. Reviewed by: jeli & qingli
# 14981d80	12-Jan-2009	Qing Li <qingli@FreeBSD.org>	Revive the RTF_LLINFO flag in route.h. The kernel code is guarded by the new kernel option COMPAT_ROUTE_FLAGS for binary backward compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO. RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS is defined.
# 8eca593c	26-Dec-2008	Qing Li <qingli@FreeBSD.org>	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian
# 6e6b3f7c	14-Dec-2008	Qing Li <qingli@FreeBSD.org>	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion
# 3120b9d4	07-Dec-2008	Kip Macy <kmacy@FreeBSD.org>	- convert radix node head lock from mutex to rwlock - make radix node head lock not recursive - fix LOR in rtexpunge - fix LOR in rtredirect Reviewed by: sam
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 4e7840e2	20-Sep-2008	Marko Zec <zec@FreeBSD.org>	Move #defines for MRT-related constants from net/route.c to net/route.h, because the vnet code will need those constants as well. Reviewed by: bz Approved by: julian (mentor) MFC after: never
# 5e7b481a	14-Sep-2008	Julian Elischer <julian@FreeBSD.org>	come on Julian, make up if you're committing one change or the other. fix braino
# 93fcb5a2	14-Sep-2008	Julian Elischer <julian@FreeBSD.org>	Revert a part of the MRT commit that proved un-needed. rt_check() in its original form proved to be sufficient and rt_check_fib() can go away (as can its evil twin in_rt_check()). I believe this does NOT address the crashes people have been seeing in rt_check. MFC after: 1 week
# abeae30c	05-Sep-2008	Julian Elischer <julian@FreeBSD.org>	Be consistent about whether these multi-lined macros are separated by a blank line. Some were, some weren't. Decide in favour of the line as it matches what an inline would do, and it's easier to read.
# 6f95a5eb	09-May-2008	Julian Elischer <julian@FreeBSD.org>	move a #define from a place it shouldn't have been to a place it should have been. Basically my testign didn't ocver one case that this broke. thanks tinderbox!
# 8b07e49a	09-May-2008	Julian Elischer <julian@FreeBSD.org>	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
# e440aed9	12-Apr-2008	Qing Li <qingli@FreeBSD.org>	This patch provides the back end support for equal-cost multi-path (ECMP) for both IPv4 and IPv6. Previously, multipath route insertion is disallowed. For example, route add -net 192.103.54.0/24 10.9.44.1 route add -net 192.103.54.0/24 10.9.44.2 The second route insertion will trigger an error message of "add net 192.103.54.0/24: gateway 10.2.5.2: route already in table" Multiple default routes can also be inserted. Here is the netstat output: default 10.2.5.1 UGS 0 3074 bge0 => default 10.2.5.2 UGS 0 0 bge0 When multipath routes exist, the "route delete" command requires a specific gateway to be specified or else an error message would be displayed. For example, route delete default would fail and trigger the following error message: "route: writing to routing socket: No such process" "delete net default: not in table" On the other hand, route delete default 10.2.5.2 would be successful: "delete net default: gateway 10.2.5.2" One does not have to specify a gateway if there is only a single route for a particular destination. I need to perform more testings on address aliases and multiple interfaces that have the same IP prefixes. This patch as it stands today is not yet ready for prime time. Therefore, the ECMP code fragments are fully guarded by the RADIX_MPATH macro. Include the "options RADIX_MPATH" in the kernel configuration to enable this feature. Reviewed by: robert, sam, gnn, julian, kmacy
# f321ff15	27-Dec-2007	Maxime Henrion <mux@FreeBSD.org>	Add a workaround for a deadlock between the rt_setgate() and rt_check() functions. It is easily triggered by running routed, and, I expect, by running any other daemon that uses routing sockets. Reviewed by: net@ MFC after: 1 week
# 29910a5a	17-Dec-2007	Kip Macy <kmacy@FreeBSD.org>	widen the routing event interface (arp update, redirect, and eventually pmtu change) into separate functions revert previous commit's changes to arpresolve and add a new interface arpresolve2 which does arp resolution without an mbuf
# 8e7e854c	12-Dec-2007	Kip Macy <kmacy@FreeBSD.org>	add interface for allowing consumers to register for ARP updates, redirects, and path MTU changes Reviewed by: silby
# 22cafcf0	15-Mar-2006	Andre Oppermann <andre@FreeBSD.org>	- Fill in the correct rtm_index for RTM_ADD and RTM_CHANGE messages. - Allow RTM_CHANGE to change a number of route flags as specified by RTF_FMASK. - The unused rtm_use field in struct rt_msghdr is redesignated as rtm_fmask field to communicate route flag changes in RTM_CHANGE messages from userland. The use count of a route was moved to rtm_rmx a long time ago. For source code compatibility reasons a define of rtm_use to rtm_fmask is provided. These changes faciliate running of multiple cooperating routing daemons at the same time without causing undesired interference. Open[BGP\|OSPF]D make use of these features to have IGP routes override EGP ones. Obtained from: OpenBSD (claudio@) MFC after: 3 days
# 17a8471f	14-Sep-2005	Andre Oppermann <andre@FreeBSD.org>	Remove bogous semicolons at the end of the definitions of 'do { ... } while (0)' macros. PR: kern/83088 Sumbitted by: <antoine.brodin at laposte.net>
# c398230b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# b83a279f	05-Oct-2004	Sam Leffler <sam@FreeBSD.org>	Add 802.11-specific events that are dispatched through the routing socket. This really doesn't belong here but is preferred (for the moment) over adding yet another mechanism for sending msgs from the kernel to user apps. Reviewed by: imp
# 445e045b	28-Jul-2004	Alexander Kabaev <kan@FreeBSD.org>	Avoid casts as lvalues.
# 3916ebe8	24-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	document the locking behaviour of the functions that access the routing table.
# f76d5670	20-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	Document an assumption on the structure of 'struct rtentry'
# 2eb5613f	17-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	make route_init() static
# e74642df	13-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	route.h: introduce a macro, SA_SIZE(struct sockaddr *) which returns the space occupied by a struct sockaddr when passed through a routing socket. Use it to replace the macro ROUNDUP(int), that does the same but is redefined by every file which uses it, courtesy of the School of Cut'n'Paste Programming(TM). (partial) userland changes to follow.
# f36cfd49	07-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
# f7c5baa1	03-Apr-2004	Luigi Rizzo <luigi@FreeBSD.org>	+ arpresolve(): remove an unused argument + struct ifnet: remove unused fields, move ipv6-related field close to each other, add a pointer to l3<->l2 translation tables (arp,nd6, etc.) for future use. + struct route: remove an unused field, move close to each other some fields that might likely go away in the future
# 97d8d152	20-Nov-2003	Andre Oppermann <andre@FreeBSD.org>	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
# 26d02ca7	20-Nov-2003	Andre Oppermann <andre@FreeBSD.org>	Remove RTF_PRCLONING from routing table and adjust users of it accordingly. The define is left intact for ABI compatibility with userland. This is a pre-step for the introduction of tcp_hostcache. The network stack remains fully useable with this change. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
# 7138d65c	08-Nov-2003	Sam Leffler <sam@FreeBSD.org>	replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF macros that expand to include assertions when the system is built with INVARIANTS Supported by: FreeBSD Foundation
# 9c63e9db	30-Oct-2003	Sam Leffler <sam@FreeBSD.org>	Overhaul routing table entry cleanup by introducing a new rtexpunge routine that takes a locked routing table reference and removes all references to the entry in the various data structures. This eliminates instances of recursive locking and also closes races where the lock on the entry had to be dropped prior to calling rtrequest(RTM_DELETE). This also cleans up confusion where the caller held a reference to an entry that might have been reclaimed (and in some cases used that reference). Supported by: FreeBSD Foundation
# d1dd20be	03-Oct-2003	Sam Leffler <sam@FreeBSD.org>	Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
# becc44d7	03-Oct-2003	Sam Leffler <sam@FreeBSD.org>	cleanups prior to adding locking (and in some cases to eliminate locking): o move route_cb to be private to rtsock.c o replace global static route_proto by locals o eliminate global #define shorthands for info references o remove some register decls o ansi-fy function decls o move items to be close in scope to their usage o add rt_dispatch function for dispatching the actual message o cleanup tangled logic for doing all-but-me msg send Support by: FreeBSD Foundation
# 1ebe9986	18-Jul-2003	Jeffrey Hsu <hsu@FreeBSD.org>	Add mutex for routing entries. Reviewed by: bmilekic, silby
# 3c6b084e	05-Mar-2003	Peter Wemm <peter@FreeBSD.org>	Finish driving a stake through the heart of netns and the associated ifdefs scattered around the place - its dead Jim! The SMB stuff had stolen AF_NS, make it official.
# 7f760c48	02-Mar-2003	Matthew N. Dodd <mdodd@FreeBSD.org>	Reduce code duplication. This adds the function rt_check() to route.c. Approved by: sam (in principle)
# 34fe62c7	24-Mar-2002	Bruce Evans <bde@FreeBSD.org>	Fixed some style bugs in the removal of __P(()). The main ones were not removing tabs before "__P((", and not outdenting continuation lines to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting and/or rewrap the whole prototype in some cases.
# 929ddbbb	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# 7b6edd04	18-Jan-2002	Ruslan Ermilov <ru@FreeBSD.org>	Introduce an interface announcement message for the routing socket so that routing daemons and other interested parties know when an interface is attached/detached. PR: kern/33747 Obtained from: NetBSD MFC after: 2 weeks
# be2ac88c	21-Nov-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Introduce a syncache, which enables FreeBSD to withstand a SYN flood DoS in an improved fashion over the existing code. Reviewed by: silby (in a previous iteration) Sponsored by: DARPA, NAI Labs
# 8071913d	17-Oct-2001	Ruslan Ermilov <ru@FreeBSD.org>	Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2. Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo '' as the argument. Pass rt_addrinfo all the way down to rtrequest1 and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now ``rt_addrinfo '' instead of ``sockaddr '' (almost noone is using it anyways). Benefit: the following command now works. Previously we needed two route(8) invocations, "add" then "change". # route add -inet6 default ::1 -ifp gif0 Remove unsafe typecast in rtrequest(), from ``rtentry '' to ``sockaddr *''. It was introduced by 4.3BSD-Reno and never corrected. Obtained from: BSD/OS, NetBSD MFC after: 1 month PR: kern/28360
# 4862bf8c	17-Oct-2001	Ruslan Ermilov <ru@FreeBSD.org>	64-bit fixes from CSRG.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# c3cb7e5d	25-Jul-2001	Bill Fenner <fenner@FreeBSD.org>	Don't bother passing p to rtioctl just so it can fail to pass it to mrt_ioctl
# e7f32693	21-Jul-2000	Jayanth Vijayaraghavan <jayanth@FreeBSD.org>	When a connection is being dropped due to a listen queue overflow, delete the cloned route that is associated with the connection. This does not exhaust the routing table memory when the system is under a SYN flood attack. The route entry is not deleted if there is any prior information cached in it. Reviewed by: Peter Wemm,asmodai
# 242c5536	12-Feb-2000	Peter Wemm <peter@FreeBSD.org>	Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world. Approved by: jkh
# 664a31e4	28-Dec-1999	Peter Wemm <peter@FreeBSD.org>	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
# ae5bcbff	09-Dec-1999	Yoshinobu Inoue <shin@FreeBSD.org>	rtcalloc() is removed because it turned out not to be necessary for FreeBSD. (It was added as a part of KAME patch) Specified by: jdp@polstra.com
# 76429de4	05-Nov-1999	Yoshinobu Inoue <shin@FreeBSD.org>	KAME related header files additions and merges. (only those which don't affect c source files so much) Reviewed by: cvs-committers Obtained from: KAME project
# 97998e86	13-Sep-1999	Ruslan Ermilov <ru@FreeBSD.org>	Add comments, fix typos. Reviewed by: wollman
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 64e41ba7	30-Jun-1999	Mike Smith <msmith@FreeBSD.org>	Increase the size of the route reference count from 15 bits to 31 bits. This doesn't change the size or alignment of the structure on either i386 or Alpha, and thus should be binary-compatible (modulo problems with old applications and routes with more than 2^15 references). Reviewed by: peter
# dfd5dee1	06-May-1999	Peter Wemm <peter@FreeBSD.org>	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
# bd7ac1b2	23-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Added a forward struct declaration so that this file is less self-insufficient.
# bea0f0be	06-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Some staticized variables were still declared to be extern.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 477180fb	13-Jan-1997	Garrett Wollman <wollman@FreeBSD.org>	Use the new if_multiaddrs list for multicast addresses rather than the previous hackery involving struct in_ifaddr and arpcom. Get rid of the abominable multi_kludge. Update all network interfaces to use the new machanism. Distressingly few Ethernet drivers program the multicast filter properly (assuming the hardware has one, which it usually does).
# 6303c69c	09-Oct-1996	Garrett Wollman <wollman@FreeBSD.org>	Get rid of obsolete RTF_MASK and RTF_CHAINDELETE flags.
# 8a15e7f4	26-Aug-1996	Julian Elischer <julian@FreeBSD.org>	change a comment to match what the BSD4.4 book says.
# 9f9b3dc4	06-May-1996	Garrett Wollman <wollman@FreeBSD.org>	Add three new route flags to help determine what sort of address the destination represents. For IP: - Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local (belongs to this host) and broadcast addresses, respectively. - For all routes, RTF_MULTICAST is set if the destination is multicast. The RTF_BROADCAST flag is used by ip_output() to eliminate a call to in_broadcast() in a common case; this gives about 1% in our packet-generation experiments. All three flags might be used (although they aren't now) to determine whether a packet can be forwarded; a given host route can represent a forwardable address if: (rt->rt_flags & (RTF_HOST \| RTF_LOCAL \| RTF_BROADCAST \| RTF_MULTICAST)) == RTF_HOST Obviously, one still has to do all the work if a host route is not present, but this code allows one to cache the results of such a lookup if rtalloc1() is called without masking RTF_PRCLONING.
# 6c5e9bbd	30-Jan-1996	Mike Pritchard <mpp@FreeBSD.org>	Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
# f708ef1b	14-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Another mega commit to staticize things.
# 52041295	16-Nov-1995	Poul-Henning Kamp <phk@FreeBSD.org>	All net.* sysctl converted now.
# cc6a66f2	26-Oct-1995	Julian Elischer <julian@FreeBSD.org>	Reviewed by: julian and jhay@mikom.csir.co.za Submitted by: Mike Mitchell, supervisor@alb.asctmd.com This is a bulk mport of Mike's IPX/SPX protocol stacks and all the related gunf that goes with it.. it is not guaranteed to work 100% correctly at this time but as we had several people trying to work on it I figured it would be better to get it checked in so they could all get teh same thing to work on.. Mikes been using it for a year or so but on 2.0 more changes and stuff will be merged in from other developers now that this is in. Mike Mitchell, Network Engineer AMTECH Systems Corporation, Technology and Manufacturing 8600 Jefferson Street, Albuquerque, New Mexico 87113 (505) 856-8000 supervisor@alb.asctmd.com
# 3ba6426d	16-Oct-1995	Garrett Wollman <wollman@FreeBSD.org>	Change signature of rt->rt_output() so that it is compatible with ifp->if_output() functions. This way, initial implementations of rt_output functionality can just lazily use if_output until customized versions are written.
# 9e52b982	03-Oct-1995	Garrett Wollman <wollman@FreeBSD.org>	Import of 4.4-Lite-2 sys/net to make merge and examination easier. Since we are not on the vendor branch for any of these files, the conflicts shown make no matter. Obtained from: 4.4BSD-Lite-2
# 28f8db14	29-Jul-1995	Bruce Evans <bde@FreeBSD.org>	Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# c2bed6a3	20-Mar-1995	Garrett Wollman <wollman@FreeBSD.org>	Better fix for the deletion of parents of cloned routes problem, superseding the `nextchild' hack. This also provides a way forward to fix RTM_CHANGE and RTM_ADD as well.
# 81e3e057	08-Feb-1995	Garrett Wollman <wollman@FreeBSD.org>	Define RTF_PINNED for future use.
# 54044407	07-Feb-1995	Garrett Wollman <wollman@FreeBSD.org>	Correct fix for merge conflicts: RTM_VERSION is always 5. Header files included by user code must never depend on kernel compile options.
# e128aa71	06-Feb-1995	David Greenman <dg@FreeBSD.org>	Fixed unresolved CVS conflict on RTM_VERSION.
# 24e16f2e	06-Feb-1995	Garrett Wollman <wollman@FreeBSD.org>	Merge in the socket-level support for Transaction TCP from the OLAH_TTCP branch. Submitted by: Andras Olah <olah@cs.utwente.nl>
# 6670942f	23-Jan-1995	Bruce Evans <bde@FreeBSD.org>	Declare `struct mbuf' with the correct scope to avoid lots of warnings for compiling routed... Previously a kernel function pointer that is bogusly visible to applications was incompletely declared to hide the problem.
# 18e1f1f1	22-Jan-1995	Garrett Wollman <wollman@FreeBSD.org>	route.c: keep track of where cloned routes come from, and make sure to delete them when the ``parent'' goes away route.h: add glue to track this to rtentry structure. WARNING WILL ROBINSON! This will be yet another incompatible change in your route-using binaries. I apologize, but this was the only way to do it. I took this opportunity to increase the size of the metrics to what I believe will be the final length for 2.1, so that when the T/TCP stuff is done, this won't happen again.
# 995add1a	13-Dec-1994	Garrett Wollman <wollman@FreeBSD.org>	Add support for two separate cloning flags, one set by the lower layers, and one set by the protocol family. Also add another parameter to rtalloc1() to allow for any interface flags to be ignored; currently this is only useful for RTF_PRCLONING. Get rid of rt_prflags and re-unite with rt_flags. Add T/TCP ``route metrics''. NB: YOU MUST RECOMPILE `route' AND OTHER RELATED PROGRAMS AS A RESULT OF THIS CHANGE. This also adds a new interface parameter, `ifi_physical', which will eventually replace IFF_ALTPHYS as the mechanism for specifying the particular physical connection desired on a multiple-connection card. NB: YOU MUST RECOMPILE `ifconfig' AND OTHER RELATED PROGRAMS AS A RESULT OF THIS CHANGE.
# f084e014	02-Nov-1994	Garrett Wollman <wollman@FreeBSD.org>	Collapse two fields so that we have space for another 32 flags. NB: You will have to recompile programs which use the `rt_use' member in order to get the correct values. This should not cause incorrect operation, but the statistics may look a little confusing.
# cea1da3b	20-Aug-1994	Paul Richards <paul@FreeBSD.org>	Make idempotent. Submitted by: Paul
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources