History log of /freebsd-current/sys/netinet/ip_var.h
Revision Date Author Comments
# 60d8dbbe 18-Jan-2024 Kristof Provost <kp@FreeBSD.org>

netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters

When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.

Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).

For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'

This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT

Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# b68d2789 01-Nov-2023 Igor Ostapenko <pm@igoro.pro>

ip_var.h: align comment style

MFC after: 2 weeks
Reviewed by: kp
Pull Request: https://github.com/freebsd/freebsd-src/pull/883


# c1146e6a 20-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: use an enum for packet direction in divert tag

The benefit is that in the debugger you will see PF_DIVERT_MTAG_DIR_IN
instead of 1 when looking at a structure. And compilation time failure
if anybody sets it to a wrong value. Using "port" instead of "ndir" when
assigning a port improves readability of code.

Suggested by: glebius
MFC after: 3 weeks
X-MFC-With: fabf705f4b


# fabf705f 18-Oct-2023 Igor Ostapenko <pm@igoro.pro>

pf: fix pf divert-to loop

Resolved conflict between ipfw and pf if both are used and pf wants to
do divert(4) by having separate mtags for pf and ipfw.

Also fix the incorrect 'rulenum' check, which caused the reported loop.

While here add a few test cases to ensure that divert-to works as
expected, even if ipfw is loaded.

divert(4)
PR: 272770
MFC after: 3 weeks
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D42142


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# 5ab15157 24-May-2023 Doug Rabson <dfr@FreeBSD.org>

netinet*: Fix redirects for connections from localhost

Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter
rules to change the destination address and port for a connection.
Typically, the rule triggers on an input event when a packet is received
by a router and the destination address and/or port is changed to
implement the redirect. When a reply packet on this connection is output
to the network, the rule triggers again, reversing the modification.

When the connection is initiated on the same host as the packet filter,
it is initially output via lo0 which queues it for input processing.
This causes an input event on the lo0 interface, allowing redirect
processing to rewrite the destination and create state for the
connection. However, when the reply is received, no corresponding output
event is generated; instead, the packet is delivered to the higher level
protocol (e.g. tcp or udp) without reversing the redirect, the reply is
not matched to the connection and the packet is dropped (for tcp, a
connection reset is also sent).

This commit fixes the problem by adding a second packet filter call in
the input path. The second call happens right before the handoff to
higher level processing and provides the missing output event to allow
the redirect's reply processing to perform its rewrite. This extra
processing is disabled by default and can be enabled using pfilctl:

pfilctl link -o pf:default-out inet-local
pfilctl link -o pf:default-out6 inet6-local

PR: 268717
Reviewed-by: kp, melifaro
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40256


# fc727ad6 24-Apr-2023 Boris Lytochkin <lytboris@gmail.com>

ipfw: add [fw]mark implementation for ipfw

Packet Mark is an analogue to ipfw tags with O(1) lookup from mbuf while
regular tags require a single-linked list traversal.
Mark is a 32-bit number that can be looked up in a table
[with 'number' table-type], matched or compared with a number with optional
mask applied before comparison.
Having generic nature, Mark can be used in a variety of needs.
For example, it could be used as a security group: mark will hold a security
group id and represent a group of packet flows that shares same access
control policy.

Reviewed By: pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D39555
MFC after: 1 month


# 7fc82fd1 03-Mar-2023 Gleb Smirnoff <glebius@FreeBSD.org>

ipfw: garbage collect ip_fw_chk_ptr

It is a relict left from the old times when ipfw(4) was hooked
into IP stack directly, without pfil(9).


# fcb3f813 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet*: remove PRC_ constants and streamline ICMP processing

In the original design of the network stack from the protocol control
input method pr_ctlinput was used notify the protocols about two very
different kinds of events: internal system events and receival of an
ICMP messages from outside. These events were coded with PRC_ codes.
Today these methods are removed from the protosw(9) and are isolated
to IPv4 and IPv6 stacks and are called only from icmp*_input(). The
PRC_ codes now just create a shim layer between ICMP codes and errors
or actions taken by protocols.

- Change ipproto_ctlinput_t to pass just pointer to ICMP header. This
allows protocols to not deduct it from the internal IP header.
- Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer.
It has all the information needed to the protocols. In the structure,
change ip6c_finaldst fields to sockaddr_in6. The reason is that
icmp6_input() already has this address wrapped in sockaddr, and the
protocols want this address as sockaddr.
- For UDP tunneling control input, as well as for IPSEC control input,
change the prototypes to accept a transparent union of either ICMP
header pointer or struct ip6ctlparam pointer.
- In icmp_input() and icmp6_input() do only validation of ICMP header and
count bad packets. The translation of ICMP codes to errors/actions is
done by protocols.
- Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap,
inet6ctlerrmap arrays.
- In protocol ctlinput methods either trust what icmp_errmap() recommend,
or do our own logic based on the ICMP header.

Differential revision: https://reviews.freebsd.org/D36731


# 43d39ca7 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet*: de-void control input IP protocol methods

After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for
IPv4 we got vector ip_ctlprotox[] that is executed only and only from
icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed
only and only from icmp6_input(). This allows to use protocol specific
argument types in these methods instead of struct sockaddr and void.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36727


# 24b96f35 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet*: move ipproto_register() and co to ip_var.h and ip6_var.h

This is a FreeBSD KPI and belongs to private header not netinet/in.h.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36723


# a30cb315 08-Sep-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ip_reass: retire ipreass_slowtimo() in favor of per-slot callout

o Retire global always running ipreass_slowtimo().
o Instead use one callout entry per hash slot. The per-slot callout
would be scheduled only if a slot has entries, and would be driven
by TTL of the very last entry.
o Make net.inet.ip.fragttl read/write and document it.
o Retire IPFRAGTTL, which used to be meaningful only with PR_SLOWTIMO.

Differential revision: https://reviews.freebsd.org/D36275


# e7d02be1 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: refactor protosw and domain static declaration and load

o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d18ef), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860dcd).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee54ea).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232


# 81a34d37 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: retire pr_drain and use EVENTHANDLER(9) directly

The method was called for two different conditions: 1) the VM layer is
low on pages or 2) one of UMA zones of mbuf allocator exhausted.
This change 2) into a new event handler, but all affected network
subsystems modified to subscribe to both, so this change shall not
bring functional changes under different low memory situations.

There were three subsystems still using pr_drain: TCP, SCTP and frag6.
The latter had its protosw entry for the only reason to register its
pr_drain method.

Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36164


# 160f01f0 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ip_reass: use callout(9) directly instead of pr_slowtimo

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36236


# 78b1fc05 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: separate pr_input and pr_ctlinput out of protosw

The protosw KPI historically has implemented two quite orthogonal
things: protocols that implement a certain kind of socket, and
protocols that are IPv4/IPv6 protocol. These two things do not
make one-to-one correspondence. The pr_input and pr_ctlinput methods
were utilized only in IP protocols. This strange duality required
IP protocols that doesn't have a socket to declare protosw, e.g.
carp(4). On the other hand developers of socket protocols thought
that they need to define pr_input/pr_ctlinput always, which lead to
strange dead code, e.g. div_input() or sdp_ctlinput().

With this change pr_input and pr_ctlinput as part of protosw disappear
and IPv4/IPv6 get their private single level protocol switch table
ip_protox[] and ip6_protox[] respectively, pointing at array of
ipproto_input_t functions. The pr_ctlinput that was used for
control input coming from the network (ICMP, ICMPv6) is now represented
by ip_ctlprotox[] and ip6_ctlprotox[].

ipproto_register() becomes the only official way to register in the
table. Those protocols that were always static and unlikely anybody
is interested in making them loadable, are now registered by ip_init(),
ip6_init(). An IP protocol that considers itself unloadable shall
register itself within its own private SYSINIT().

Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36157


# 3d2041c0 11-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

raw ip: merge rip_output() into rip_send()

While here, address the unlocked 'dst' read. Solve that by storing
a pointer either to the inpcb or to the sockaddr. If we end up
copying address out of the inpcb, that would be done under the read
lock section.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36127


# 89128ff3 03-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protocols: init with standard SYSINIT(9) or VNET_SYSINIT

The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.

Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33537


# aa70361d 24-Dec-2021 Kristof Provost <kp@FreeBSD.org>

headers: make a few more headers self-contained

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 8ad114c0 12-Nov-2020 George V. Neville-Neil <gnn@FreeBSD.org>

An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent
forwarding performance in the kernel.

Reviewed by: ae, melifaro
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")


# 35c7bb34 24-Sep-2019 Randall Stewart <rrs@FreeBSD.org>

This commit adds BBR (Bottleneck Bandwidth and RTT) congestion control. This
is a completely separate TCP stack (tcp_bbr.ko) that will be built only if
you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option
TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that
supports hardware pacing, BBR understands how to use such a feature.

Note that this commit also adds in a general purpose time-filter which
allows you to have a min-filter or max-filter. A filter allows you to
have a low (or high) value for some period of time and degrade slowly
to another value has time passes. You can find out the details of
BBR by looking at the original paper at:

https://queue.acm.org/detail.cfm?id=3022184

or consult many other web resources you can find on the web
referenced by "BBR congestion control". It should be noted that
BBRv1 (which this is) does tend to unfairness in cases of small
buffered paths, and it will usually get less bandwidth in the case
of large BDP paths(when competing with new-reno or cubic flows). BBR
is still an active research area and we do plan on implementing V2
of BBR to see if it is an improvement over V1.

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D21582


# 59854ecf 25-Jun-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.

The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.

To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi

All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.

This patch has been tested by pho@ .

Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by: markj @
MFC after: 1 week
Sponsored by: Mellanox Technologies


# dc0fa4f7 14-Mar-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Remove 'dir' argument from dummynet_io(). This makes it possible to make
dn_dir flags private to dummynet. There is still some room for improvement.


# cef9f220 14-Mar-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info.
While here make 'tee' boolean.


# 1830dae3 14-Mar-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Make second argument of ip_divert(), that specifies packet direction a bool.
This allows pf(4) to avoid including ipfw(4) private files.


# b252313f 31-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

New pfil(9) KPI together with newborn pfil API and control utility.

The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision: https://reviews.freebsd.org/D18951


# 2157f3c3 16-Nov-2018 Jonathan T. Looney <jtl@FreeBSD.org>

Add some additional length checks to the IPv4 fragmentation code.

Specifically, block 0-length fragments, even when the MF bit is clear.
Also, ensure that every fragment with the MF bit clear ends at the same
offset and that no subsequently-received fragments exceed that offset.

Reviewed by: glebius, markj
MFC after: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D17922


# cb6bb230 19-May-2018 Matt Macy <mmacy@FreeBSD.org>

ip(6)_freemoptions: defer imo destruction to epoch callback task

Avoid the ugly unlock / lock of the inpcbinfo where we need to
figure out what kind of lock we hold by simply deferring the
operation to another context. (Also a small dependency for
converting the pcbinfo read lock to epoch)


# 28c00100 05-May-2018 Matt Macy <mmacy@FreeBSD.org>

Currently in_pcbfree will unconditionally wunlock the pcbinfo lock
to avoid a LOR on the multicast list lock in the freemoptions routines.
As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly.
Trying to wunlock the pcbinfo lock in that context has caused a number
of reported crashes.

This change unclutters in_pcbfree and moves the handling of wunlock vs
runlock of pcbinfo to the freemoptions routine.

Reported by: mjg@, bde@, o.hartmann at walstatt.org
Approved by: sbruno


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 3f58662d 01-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

The pr_destroy field does not allow us to run the teardown code in a
specific order. VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by: gnn, tuexen (sctp), jhb (kernel.h)
Obtained from: projects/vnet
MFC after: 2 weeks
X-MFC: do not remove pr_destroy
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6652


# f2cf0121 22-Jan-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4 @180887:

With pr_destroy being gone, call ip_destroy from an ordered
VNET_SYSUNINT. Make ip_destroy() static.

Sponsored by: The FreeBSD Foundation


# 1f12da0e 22-Jan-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Just checkpoint the WIP in order to be able to make the tree update
easier. Note: this is currently not in a usable state as certain
teardown parts are not called and the DOMAIN rework is missing.
More to come soon and find its way to head.

Obtained from: P4 //depot/user/bz/vimage/...
Sponsored by: The FreeBSD Foundation


# 9977be4a 09-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
return IPv6 destination address with embedded scope. Currently
rw_gateway has it scope embedded, do the same for non-gatewayed
destinations.

Sponsored by: Yandex LLC


# a6e8e924 18-Jul-2015 Luigi Rizzo <luigi@FreeBSD.org>

fix a typo in a comment


# 6d947416 01-Apr-2015 Gleb Smirnoff <glebius@FreeBSD.org>

o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
datagrams to any value, to improve performance. The behaviour is
controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision: https://reviews.freebsd.org/D2177
Reviewed by: adrian, cy, rpaulo
Tested by: Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Relnotes: yes


# d612b95e 27-Mar-2015 Fabien Thomas <fabient@FreeBSD.org>

On multi CPU systems, we may emit successive packets with the same id.
Fix the race by using an atomic operation.

Differential Revision: https://reviews.freebsd.org/D2141
Obtained from: emeric.poupon@stormshield.eu
MFC after: 1 week
Sponsored by: Stormshield


# 033074c4 09-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Replace 'struct route *' if_output() argument with 'struct nhop_info *'.
Leave 'struct route' as is for legacy routing api users.
Remove most of rtalloc_ign*-derived functions.


# 5bc9d581 24-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert all ip_rtaddr() users to fib4_lookup_nh_extended().
Remove ip_rtaddr().


# b4e8f808 19-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Switch IPv4 output path to use new routing api.

The goals of the new API is to provide consumers with minimal
needed information, but as fast as possible. So we provide
full nexthop info copied into alighed on-cache structure
instead of rte/ia pointers, their refcounts and locks.
This does not provide solution for protecting from egress
ifp destruction, but does not make it any worse.

Current changes:

nhops:
Add fib4_lookup_prepend() function which stores either full
L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or
L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2
info exists and we have to take "slow" path.

ip_output:
Currently ip[ 46]_output consumers use 'struct route' for
the following purposes:
1) double lookup avoidance(route caching)
2) plain route caching
3) get path MTU to be able to notify source.
The former pattern is mostly used by various tunnels
(gif, gre, stf). (Actually, gre is the only remaining,
others were already converted. Their locking model did
not scale good enogh to benefit from such caching, so
we have (temporarily) removed it without any performance
loss).
Plain route caching used by SCTP is simply wrong and should be removed.
Temporary break it for now just to be able to compile.
Optimize path mtu reporting by providing it in new 'route_info' stucture.

Minimize games with @ia locking/refcounting for route lookup:
add special nhop[46]_extended structure to store more route attributes.
Pointer to given structure can be passed to fib4_lookup_prepend() to indicate
we want this info (we actually needs it for UDP and raw IP).

ether_output:
Provide light-weight ether_output2() call to deal with
transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/
other L2 hooks before actually transmitting frame by if_transmit()).
Add a hack based on new RT_NHOP ro_flag to distinguish which version should
we call. Better way is probably to add a new "if_output_frame" driver
callbacks.

Next steps:
* Convert ip_fastfwd part
* Implement auto-growing array for per-radix nexthops
* Implement LLE tracking for nexthop calculations to be able to
immediately provide all necessary info in single route lookup
for gateway routes
* Switch radix locking scheme to runtime/cfg lock
* Implement multipath support for rtsock
* Implement "tracked nexthops" for tunnels (e.g. _proper_
nexthop caching)
* Add IPv6 support for remaining parts (postponed not to
interfere with user/ae/inet6 branch)
* Consider adding "if_output_frame" driver call to
ease logical frame pushing.


# 061a4b4c 08-Sep-2014 Adrian Chadd <adrian@FreeBSD.org>

Add a flag to ip_output() - IP_NODEFAULTFLOWID - which prevents it from
overriding an existing flowid/flowtype field in the outbound mbuf with
the inp_flowid/inp_flowtype details.

The upcoming RSS UDP support calculates a valid RSS value for outbound
mbufs and since it may change per send, it doesn't cache it in the inpcb.
So overriding it here would be wrong.

Differential Revision: https://reviews.freebsd.org/D527
Reviewed by: grehan


# 73d76e77 14-Aug-2014 Kevin Lo <kevlo@FreeBSD.org>

Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric: D564
Reviewed by: jhb


# 8f5a8818 07-Aug-2014 Kevin Lo <kevlo@FreeBSD.org>

Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric: D476
Reviewed by: jhb


# aa69c612 12-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Since both netinet/ and netinet6/ call into netipsec/ and netpfil/,
the protocol specific mbuf flags are shared between them.

- Move all M_FOO definitions into a single place: netinet/in6.h, to
avoid future clashes.
- Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted
in a failure of operation of IPSEC and packet filters.

Thanks to Nicolas and Georgios for all the hard work on bisecting,
testing and finally finding the root of the problem.

PR: kern/186755
PR: kern/185876
In collaboration with: Georgios Amanakis <gamanakis gmail.com>
In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com>
Sponsored by: Nginx, Inc.


# 88388bdc 19-Aug-2013 Andre Oppermann <andre@FreeBSD.org>

Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific
flag instead. The flag is only used within the IP and IPv6 layer 3
protocols.

Because some firewall packages treat IPv4 and IPv6 packets the same the
flag should have the same value for both.

Discussed with: trociny, glebius


# b09dc7e3 19-Aug-2013 Andre Oppermann <andre@FreeBSD.org>

Move ip_reassemble()'s use of the global M_FRAG mbuf flag to a protocol layer
specific flag instead. The flag is only relevant while the packet stays in
the IP reassembly queue.

Discussed with: trociny, glebius


# 5da0521f 09-Jul-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).


# e3389419 12-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Reflect removing of the counter_u64_subtract() function in the macro.


# 5923c293 08-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Merge from projects/counters: TCP/IP stats.

Convert 'struct ipstat' and 'struct tcpstat' to counter(9).

This speeds up IP forwarding at extreme packet rates, and
makes accounting more precise.

Sponsored by: Nginx, Inc.


# ffdbf9da 01-Nov-2012 Andrey V. Elsukov <ae@FreeBSD.org>

Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by: andre


# 078468ed 26-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>


# 3c2824b9 10-Oct-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect is
enabled. This eliminates one mtx_lock() per each routing lookup thus improving
performance in several cases (routing to directly connected interface or routing
to default gateway).

Icmp redirects should not be used to provide routing direction nowadays, even
for end hosts. Routers should not use them too (and this is explicitly restricted
in IPv6, see RFC 4861, clause 8.2).

Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every
AF_INET routing table in given VNET instance on drop_redirect sysctl change.

This change is part of bigger patch eliminating rte locking.

Sponsored by: Yandex LLC
MFC after: 2 weeks


# 7d4317bd 04-Sep-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after: 3 weeks


# c23de1f4 29-Dec-2011 John Baldwin <jhb@FreeBSD.org>

Defer the work of freeing IPv4 multicast options from a socket to an
asychronous task. This avoids tearing down multicast state including
sending IGMP leave messages and reprogramming MAC filters while holding
the per-protocol global pcbinfo lock that is used in the receive path of
packet processing.

Reviewed by: rwatson
MFC after: 1 month


# 9527ec6e 29-Jun-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.

The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.

Submitted by: Vadim Goncharov
Discussed by: ipfw@, luigi@


# aae49dd3 20-Apr-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4 CH=191470:

Move the ipport_tick_callout and related functions from ip_input.c
to in_pcb.c. The random source port allocation code has been merged
and is now local to in_pcb.c only.
Use a SYSINIT to get the callout started and no longer depend on
initialization from the inet code, which would not work in an IPv6
only setup.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 1b48d245 02-Sep-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4 CH=183052 183053 183258:

In protosw we define pr_protocol as short, while on the wire
it is an uint8_t. That way we can have "internal" protocols
like DIVERT, SEND or gaps for modules (PROTO_SPACER).
Switch ipproto_{un,}register to accept a short protocol number(*)
and do an upfront check for valid boundries. With this we
also consistently report EPROTONOSUPPORT for out of bounds
protocols, as we did for proto == 0. This allows a caller
to not error for this case, which is especially important
if we want to automatically call these from domain handling.

(*) the functions have been without any in-tree consumer
since the initial introducation, so this is considered save.

Implement ip6proto_{un,}register() similarly to their legacy IP
counter parts to allow modules to hook up dynamically.

Reviewed by: philip, will
MFC after: 1 week


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# e47658ce 27-Mar-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r204140:

Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().

Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.

Reviewed by: rwatson


# 8018e843 23-Mar-2010 Luigi Rizzo <luigi@FreeBSD.org>

MFC of a large number of ipfw and dummynet fixes and enhancements
done in CURRENT over the last 4 months.
HEAD and RELENG_8 are almost in sync now for ipfw, dummynet
the pfil hooks and related components.

Among the most noticeable changes:
- r200855 more efficient lookup of skipto rules, and remove O(N)
blocks from critical sections in the kernel;
- r204591 large restructuring of the dummynet module, with support
for multiple scheduling algorithms (4 available so far)
See the original commit logs for details.

Changes in the kernel/userland ABI should be harmless because the
kernel is able to understand previous requests from RELENG_8 and
RELENG_7. For this reason, this changeset would be applicable
to RELENG_7 as well, but i am not sure if it is worthwhile.


# 9802380e 20-Feb-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().

Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.

Sponsored by: ISPsystem
Reviewed by: rwatson
MFC After: 5 days


# 2ae7ec29 07-Feb-2010 Julian Elischer <julian@FreeBSD.org>

MFC of 197952 and 198075

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.
and
Unbreak the VIMAGE build with IPSEC, broken with r197952 by
virtualizing the pfil hooks.
For consistency add the V_ to virtualize the pfil hooks in here as well.


# b2019e17 07-Jan-2010 Luigi Rizzo <luigi@FreeBSD.org>

Following up on a request from Ermal Luci to make
ip_divert work as a client of pf(4),
make ip_divert not depend on ipfw.

This is achieved by moving to ip_var.h the struct ipfw_rule_ref
(which is part of the mtag for all reinjected packets) and other
declarations of global variables, and moving to raw_ip.c global
variables for filter and divert hooks.

Note that names and locations could be made more generic
(ipfw_rule_ref is really a generic reference robust to reconfigurations;
the packet filter is not necessarily ipfw; filters and their clients
are not necessarily limited to ipv4), but _right now_ most
of this stuff works on ipfw and ipv4, so i don't feel like
doing a gratuitous renaming, at least for the time being.


# 0b4b0b0f 10-Oct-2009 Julian Elischer <julian@FreeBSD.org>

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months


# 315e3e38 02-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.

Reviewed by: bz
Approved by: re (kib)


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 7654a365 16-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Add the explicit include of vimage.h to another five .c files still
missing it.

Remove the "hidden" kernel only include of vimage.h from ip_var.h added
with the very first Vimage commit r181803 to avoid further kernel poisoning.


# bc29160d 08-Jun-2009 Marko Zec <zec@FreeBSD.org>

Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)


# 115a40c7 05-Jun-2009 Luigi Rizzo <luigi@FreeBSD.org>

More cleanup in preparation of ipfw relocation (no actual code change):

+ move ipfw and dummynet hooks declarations to raw_ip.c (definitions
in ip_var.h) same as for most other global variables.
This removes some dependencies from ip_input.c;

+ remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly;

+ remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly;

+ move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it;

To be merged together with rev 193497

MFC after: 5 days


# 86425c62 11-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel. This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after: 3 days


# d10910e6 09-Mar-2009 Bruce M Simpson <bms@FreeBSD.org>

Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.


# 86413abf 11-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Put a global variables, which were virtualized but formerly
missed under VIMAGE_GLOBAL.

Start putting the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

While there garbage collect a few dead externs from ip6_var.h.

Sponsored by: The FreeBSD Foundation


# 385195c0 10-Dec-2008 Marko Zec <zec@FreeBSD.org>

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# f02493cb 28-Nov-2008 Marko Zec <zec@FreeBSD.org>

Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# b53c8130 24-Aug-2008 Julian Elischer <julian@FreeBSD.org>

Another V_ forgotten


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 8b07e49a 09-May-2008 Julian Elischer <julian@FreeBSD.org>

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# 71498f30 12-Jun-2007 Bruce M Simpson <bms@FreeBSD.org>

Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)


# beaa515e 04-Apr-2007 Andre Oppermann <andre@FreeBSD.org>

Some local and style(9) cleanups.


# 3548bfc9 14-May-2006 Bruce M Simpson <bms@FreeBSD.org>

Fix a long-standing limitation in IPv4 multicast group membership.

By making the imo_membership array a dynamically allocated vector,
this minimizes disruption to existing IPv4 multicast code. This
change breaks the ABI for the kernel module ip_mroute.ko, and may
cause a small amount of churn for folks working on the IGMPv3 merge.

Previously, sockets were subject to a compile-time limitation on
the number of IPv4 group memberships, which was hard-coded to 20.
The imo_membership relationship, however, is 1:1 with regards to
a tuple of multicast group address and interface address. Users who
ran routing protocols such as OSPF ran into this limitation on machines
with a large system interface tree.


# c444cdde 19-Nov-2005 Andre Oppermann <andre@FreeBSD.org>

Move MAX_IPOPTLEN and struct ipoption back into ip_var.h as
userland programs depend on it.

Pointed out by: le
Sponsored by: TCP/IP Optimization Fundraise 2005


# ef39adf0 18-Nov-2005 Andre Oppermann <andre@FreeBSD.org>

Consolidate all IP Options handling functions into ip_options.[ch] and
include ip_options.h into all files making use of IP Options functions.

From ip_input.c rev 1.306:
ip_dooptions(struct mbuf *m, int pass)
save_rte(m, option, dst)
ip_srcroute(m0)
ip_stripoptions(m, mopt)

From ip_output.c rev 1.249:
ip_insertoptions(m, opt, phlen)
ip_optcopy(ip, jp)
ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)

No functional changes in this commit.

Discussed with: rwatson
Sponsored by: TCP/IP Optimization Fundraise 2005


# 2fcb030a 02-Jul-2005 Andrew Thompson <thompsa@FreeBSD.org>

Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)


# c398230b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 5f311da2 01-Jan-2005 Mike Silbersack <silby@FreeBSD.org>

Port randomization leads to extremely fast port reuse at high
connection rates, which is causing problems for some users.

To retain the security advantage of random ports and ensure
correct operation for high connection rate users, disable
port randomization during periods of high connection rates.

Whenever the connection rate exceeds randomcps (10 by default),
randomization will be disabled for randomtime (45 by default)
seconds. These thresholds may be tuned via sysctl.

Many thanks to Igor Sysoev, who proved the necessity of this
change and tested many preliminary versions of the patch.

MFC After: 20 seconds


# de38924d 19-Oct-2004 Andre Oppermann <andre@FreeBSD.org>

Support for dynamically loadable and unloadable IP protocols in the ipmux.

With pr_proto_register() it has become possible to dynamically load protocols
within the PF_INET domain. However the PF_INET domain has a second important
structure called ip_protox[] that is derived from the 'struct protosw inetsw[]'
and takes care of the de-multiplexing of the various protocols that ride on
top of IP packets.

The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[]
array mux in a consistent and easy way. To register a protocol within
ip_protox[] the existence of a corresponding and matching protocol definition
in inetsw[] is required. The function does not allow to overwrite an already
registered protocol. The unregister function simply replaces the mux slot with
the default index pointer to IPPROTO_RAW as it was previously.


# e0982661 15-Sep-2004 Andre Oppermann <andre@FreeBSD.org>

Remove the last two global variables that are used to store packet state while
it travels through the IP stack. This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.

The IP source route options of a packet are now stored in a mtag instead of the
global variable.


# c21fd232 27-Aug-2004 Andre Oppermann <andre@FreeBSD.org>

Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.


# 9b932e9e 17-Aug-2004 Andre Oppermann <andre@FreeBSD.org>

Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.

However there are many changes how ipfw is and its add-on's are handled:

In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.

IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().

ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.

DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.

BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.

More detailed changes to the code:

conf/files
Add netinet/ip_fw_pfil.c.

conf/options
Add IPFIREWALL_FORWARD option.

modules/ipfw/Makefile
Add ip_fw_pfil.c.

net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.

netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.

netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.

netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.

netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.

netinet/ip_fw2.c
(Re)moved some global variables and the module handling.

netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.

netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.

netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.

netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)

netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.

netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.

netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.

sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.

Approved by: re (scottl)


# 1f44b0a1 14-Aug-2004 David Malone <dwmalone@FreeBSD.org>

Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD
have already done this, so I have styled the patch on their work:

1) introduce a ip_newid() static inline function that checks
the sysctl and then decides if it should return a sequential
or random IP ID.

2) named the sysctl net.inet.ip.random_id

3) IPv6 flow IDs and fragment IDs are now always random.
Flow IDs and frag IDs are significantly less common in the
IPv6 world (ie. rarely generated per-packet), so there should
be smaller performance concerns.

The sysctl defaults to 0 (sequential IP IDs).

Reviewed by: andre, silby, mlaier, ume
Based on: NetBSD
MFC after: 2 months


# 2bde81ac 06-May-2004 Andre Oppermann <andre@FreeBSD.org>

Provide the sysctl net.inet.ip.process_options to control the processing
of IP options.

net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified.
net.inet.ip.process_options=1 Process all IP options (default).
net.inet.ip.process_options=2 Reject all packets with IP options with ICMP
filter prohibited message.

This sysctl affects packets destined for the local host as well as those
only transiting through the host (routing).

IP options do not have any legitimate purpose anymore and are only used
to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP
stacks.

Reviewed by: sam (mentor)


# ab884d99 02-May-2004 Darren Reed <darrenr@FreeBSD.org>

Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside
all of the other macros that work ok mbuf's and tag's.


# f36cfd49 07-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# ac9d7e26 25-Feb-2004 Max Laier <mlaier@FreeBSD.org>

Re-remove MT_TAGs. The problems with dummynet have been fixed now.

Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam


# 36e8826f 17-Feb-2004 Max Laier <mlaier@FreeBSD.org>

Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)


# 1094bdca 13-Feb-2004 Max Laier <mlaier@FreeBSD.org>

This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson


# c76ff708 14-Nov-2003 Andre Oppermann <andre@FreeBSD.org>

Make ipstealth global as we need it in ip_fastforward too.


# 02c1c707 14-Nov-2003 Andre Oppermann <andre@FreeBSD.org>

Remove the global one-level rtcache variable and associated
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.

Reviewed by: sam (mentor)


# eca8a663 11-Nov-2003 Robert Watson <rwatson@FreeBSD.org>

Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 252f24a2 08-Nov-2003 Sam Leffler <sam@FreeBSD.org>

divert socket fixups:

o pickup Giant in divert_packet to protect sbappendaddr since it
can be entered through MPSAFE callouts or through ip_input when
mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
on an ICMP redirect (may not be needed)

Supported by: FreeBSD Foundation


# 929b31dd 14-Oct-2003 Sam Leffler <sam@FreeBSD.org>

Lock ip forwarding route cache. While we're at it, remove the global
variable ipforward_rt by introducing an ip_forward_cacheinval() call
to use to invalidate the cache.

Supported by: FreeBSD Foundation


# 134ea224 23-Sep-2003 Sam Leffler <sam@FreeBSD.org>

o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)


# 8afa2304 20-Aug-2003 Bruce M Simpson <bms@FreeBSD.org>

Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on
specific interfaces. This is required by aodvd, and may in future help us
in getting rid of the requirement for BPF from our import of isc-dhcp.

Suggested by: fenestro
Obtained from: BSD/OS
Reviewed by: mini, sam
Approved by: jake (mentor)


# 1e78ac21 07-Aug-2003 Jeffrey Hsu <hsu@FreeBSD.org>

1. Basic PIM kernel support
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):

options MROUTING # Multicast routing
options PIM # Protocol Independent Multicast

2. Add support for advanced multicast API setup/configuration and
extensibility.

3. Add support for kernel-level PIM Register encapsulation.
Disabled by default. Can be enabled by the advanced multicast API.

4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".

Submitted by: Pavlin Radoslavov <pavlin@icir.org>


# 2c56e246 02-Apr-2003 Matthew N. Dodd <mdodd@FreeBSD.org>

Back out support for RFC3514.

RFC3514 poses an unacceptale risk to compliant systems.


# 09139a45 01-Apr-2003 Matthew N. Dodd <mdodd@FreeBSD.org>

Implement support for RFC 3514 (The Security Flag in the IPv4 Header).
(See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt)

This fulfills the host requirements for userland support by
way of the setsockopt() IP_EVIL_INTENT message.

There are three sysctl tunables provided to govern system behavior.

net.inet.ip.rfc3514:

Enables support for rfc3514. As this is an
Informational RFC and support is not yet widespread
this option is disabled by default.

net.inet.ip.hear_no_evil

If set the host will discard all received evil packets.

net.inet.ip.speak_no_evil

If set the host will discard all transmitted evil packets.

The IP statistics counter 'ips_evil' (available via 'netstat') provides
information on the number of 'evil' packets recieved.

For reference, the '-E' option to 'ping' has been provided to demonstrate
and test the implementation.


# 375386e2 21-Feb-2003 Mike Silbersack <silby@FreeBSD.org>

Add the ability to limit the number of IP fragments allowed per packet,
and enable it by default, with a limit of 16.

At the same time, tweak maxfragpackets downward so that in the worst
possible case, IP reassembly can use only 1/2 of all mbuf clusters.

MFC after: 3 days
Reviewed by: hsu
Liked by: bmah


# b375c9ec 20-Nov-2002 Luigi Rizzo <luigi@FreeBSD.org>

Back out the ip_fragment() code -- it is not urgent to have it in now,
I will put it back in in a better form after 5.0 is out.

Requested by: sam, rwatson, luigi (on second thought)
Approved by: re


# 3e372e14 17-Nov-2002 Luigi Rizzo <luigi@FreeBSD.org>

Move the ip_fragment code from ip_output() to a separate function,
so that it can be reused elsewhere (there is a number of places
where it can be useful). This also trims some 200 lines from
the body of ip_output(), which helps readability a bit.

(This change was discussed a few weeks ago on the mailing lists,
Julian agreed, silence from others. It is not a functional change,
so i expect it to be ok to commit it now but i am happy to back it
out if there are objections).

While at it, fix some function headers and replace m_copy() with
m_copypacket() where applicable.

MFC after: 1 week


# bbb4330b 15-Nov-2002 Luigi Rizzo <luigi@FreeBSD.org>

Massive cleanup of the ip_mroute code.

No functional changes, but:

+ the mrouting module now should behave the same as the compiled-in
version (it did not before, some of the rsvp code was not loaded
properly);
+ netinet/ip_mroute.c is now truly optional;
+ removed some redundant/unused code;
+ changed many instances of '0' to NULL and INADDR_ANY as appropriate;
+ removed several static variables to make the code more SMP-friendly;
+ fixed some minor bugs in the mrouting code (mostly, incorrect return
values from functions).

This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).

Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.

Detailed changes:
--------------------
netinet/ip_mroute.c all the above.
conf/files make ip_mroute.c optional
net/route.c fix mrt_ioctl hook
netinet/ip_input.c fix ip_mforward hook, move rsvp_input() here
together with other rsvp code, and a couple
of indentation fixes.
netinet/ip_output.c fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h rsvp function hooks
netinet/raw_ip.c hooks for mrouting and rsvp functions, plus
interface cleanup.
netinet/ip_mroute.h remove an unused and optional field from a struct

Most of the code is from Pavlin Radoslavov and the XORP project

Reviewed by: sam
MFC after: 1 week


# 53be11f6 20-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix two instances of variant struct definitions in sys/netinet:

Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.

Add a CTASSERT protecting sizeof(struct ip) == 20.

Don't let the size of struct ipq depend on the IPDIVERT option.

This is a functional no-op commit.

Approved by: re


# 5d846453 15-Oct-2002 Sam Leffler <sam@FreeBSD.org>

Replace aux mbufs with packet tags:

o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month


# 9daf40fe 15-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Perform a nested include of _label.h if #ifdef _KERNEL. This will
satisfy consumers of ip_var.h that need a complete definition of
struct ipq and don't include mac.h.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 549e4c9e 30-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label IP fragment reassembly queues, permitting security features to
be maintained on those objects. ipq_label will be used to manage
the reassembly of fragments into IP datagrams using security
properties. This permits policies to deny the reassembly of fragments,
as well as influence the resulting label of a datagram following
reassembly.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 69dac2ea 20-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel
data structures pick up security and synchronization primitives, it
becomes increasingly desirable not to arbitrarily export them via
include files to userland, as the userland applications pick up new
#include dependencies.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# ec3057db 23-Jun-2002 Luigi Rizzo <luigi@FreeBSD.org>

Remove ip_fw_fwd_addr (forgotten in previous commit)
remove some extra whitespace.


# 2b25acc1 22-Jun-2002 Luigi Rizzo <luigi@FreeBSD.org>

Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

ip_divert_cookie used by divert sockets
ip_fw_fwd_addr used for transparent ip redirection
last_pkt used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
* there is at least one global variable left, sro_fwd, in ip_output().
I am not sure if/how this can be removed.

* I have deliberately avoided gratuitous style changes in this commit
to avoid cluttering the diffs. Minor stule cleanup will likely be
necessary

* this commit only focused on the IP layer. I am sure there is a
number of global variables used in the TCP and maybe UDP stack.

* despite the number of files touched, there are absolutely no API's
or data structures changed by this commit (except the interfaces of
ip_fw_chk() and dummynet_io(), which are internal anyways), so
an MFC is quite safe and unintrusive (and desirable, given the
improved readability of the code).

MFC after: 10 days


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# bd714208 30-Nov-2001 Ruslan Ermilov <ru@FreeBSD.org>

- Make ip_rtaddr() global, and use it to look up the correct source
address in icmp_reflect().
- Two new "struct icmpstat" members: icps_badaddr and icps_noroute.

PR: kern/31575
Obtained from: BSD/OS
MFC after: 1 week


# f0ffb944 03-Sep-2001 Julian Elischer <julian@FreeBSD.org>

Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.


# 33841545 10-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 64dddc18 01-Jun-2001 Kris Kennaway <kris@FreeBSD.org>

Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.

Reviewed by: -net
Obtained from: OpenBSD


# 1e3d5af0 19-Mar-2001 Ruslan Ermilov <ru@FreeBSD.org>

Invalidate cached forwarding route (ipforward_rt) whenever a new route
is added to the routing table, otherwise we may end up using the wrong
route when forwarding.

PR: kern/10778
Reviewed by: silence on -net


# 462b86fe 16-Mar-2001 Poul-Henning Kamp <phk@FreeBSD.org>

<sys/queue.h> makeover.


# 686cdd19 04-Jul-2000 Jun-ichiro itojun Hagino <itojun@FreeBSD.org>

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


# 1c238475 21-May-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Compute the checksum before handing the packet off to IPFilter.

Tested by: Cy Schubert <Cy.Schubert@uumail.gov.bc.ca>


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 6a800098 22-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 8948e4ba 05-Dec-1999 Archie Cobbs <archie@FreeBSD.org>

Miscellaneous fixes/cleanups relating to ipfw and divert(4):

- Implement 'ipfw tee' (finally)
- Divert packets by calling new function divert_packet() directly instead
of going through protosw[].
- Replace kludgey global variable 'ip_divert_port' with a function parameter
to divert_packet()
- Replace kludgey global variable 'frag_divert_port' with a function parameter
to ip_reass()
- style(9) fixes

Reviewed by: julian, green


# 76429de4 05-Nov-1999 Yoshinobu Inoue <shin@FreeBSD.org>

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 6effc713 24-Aug-1998 Doug Rabson <dfr@FreeBSD.org>

Re-implement tcp and ip fragment reassembly to not store pointers in the
ip header which can't work on alpha since pointers are too big.

Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>


# cfe8b629 22-Aug-1998 Garrett Wollman <wollman@FreeBSD.org>

Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.


# 5914e1ea 12-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Removed a bogus forward struct declaration.

Cleaned up ifdefs.


# f9e354df 05-Jul-1998 Julian Elischer <julian@FreeBSD.org>

Support for IPFW based transparent forwarding.
Any packet that can be matched by a ipfw rule can be redirected
transparently to another port or machine. Redirection to another port
mostly makes sense with tcp, where a session can be set up
between a proxy and an unsuspecting client. Redirection to another machine
requires that the other machine also be expecting to receive the forwarded
packets, as their headers will not have been modified.

/sbin/ipfw must be recompiled!!!

Reviewed by: Peter Wemm <peter@freebsd.org>
Submitted by: Chrisy Luke <chrisy@flix.net>


# a09bee1a 08-Jun-1998 Bruce Evans <bde@FreeBSD.org>

Fixed pedantic semantics errors (bitfields not of type int, signed int
or unsigned int (this doesn't change the struct layout, size or
alignment in any of the files changed in this commit, at least for
gcc on i386's. Using bitfields of type u_char may affect size and
alignment but not packing)).


# c977d4c7 06-Jun-1998 Julian Elischer <julian@FreeBSD.org>

clean up the changes made to ipfw over the last weeks
(should make the ipfw lkm work again)


# e256a933 05-Jun-1998 Julian Elischer <julian@FreeBSD.org>

Reverse the default sense of the IPFW/DIVERT reinjection code
so that the new behaviour is now default.
Solves the "infinite loop in diversion" problem when more than one diversion
is active.
Man page changes follow.

The new code is in -stable as the NON default option.


# bb60f459 25-May-1998 Julian Elischer <julian@FreeBSD.org>

Add optional code to change the way that divert and ipfw work together.
Prior to this change, Accidental recursion protection was done by
the diverted daemon feeding back the divert port number it got
the packet on, as the port number on a sendto(). IPFW knew not to
redivert a packet to this port (again). Processing of the ruleset
started at the beginning again, skipping that divert port.

The new semantic (which is how we should have done it the first time)
is that the port number in the sendto() is the rule number AFTER which
processing should restart, and on a recvfrom(), the port number is the
rule number which caused the diversion. This is much more flexible,
and also more intuitive. If the user uses the same sockaddr received
when resending, processing resumes at the rule number following that
that caused the diversion. The user can however select to resume rule
processing at any rule. (0 is restart at the beginning)

To enable the new code use

option IPFW_DIVERT_RESTART

This should become the default as soon as people have looked at it a bit


# 4c711e6d 19-May-1998 Pierre Beyssac <pb@FreeBSD.org>

Move (private) struct ipflow out of ip_var.h, to reduce dependencies
(for ipfw for example) on internal implementation details.
Add $Id$ where missing.


# cc3205c2 19-May-1998 David Greenman <dg@FreeBSD.org>

Moved #define of IPFLOW_HASHBITS to ip_flow.c where I think it belongs.


# 1f91d8c5 19-May-1998 David Greenman <dg@FreeBSD.org>

Added fast IP forwarding code by Matt Thomas <matt@3am-software.com> via
NetBSD, ported to FreeBSD by Pierre Beyssac <pb@fasterix.freenix.org> and
minorly tweaked by me.
This is a standard part of FreeBSD, but must be enabled with:
"sysctl -w net.inet.ip.fastforwarding=1" ...and of course forwarding must
also be enabled. This should probably be modified to use the zone
allocator for speed and space efficiency. The current algorithm also
appears to lose if the number of active paths exceeds IPFLOW_MAX (256),
in which case it wastes lots of time trying to figure out which cache
entry to drop.


# bea0f0be 06-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Some staticized variables were still declared to be extern.


# 77d1915b 25-May-1997 Peter Wemm <peter@FreeBSD.org>

Connect the ipdivert div_usrreqs struct to the ip proto switch table


# a29f300e 27-Apr-1997 Garrett Wollman <wollman@FreeBSD.org>

The long-awaited mega-massive-network-code- cleanup. Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 117bcae7 18-Feb-1997 Garrett Wollman <wollman@FreeBSD.org>

Convert raw IP from mondo-switch-statement-from-Hell to
pr_usrreqs. Collapse duplicates with udp_usrreq.c and
tcp_usrreq.c (calling the generic routines in uipc_socket2.c and
in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached
socket now traps, rather than harmlessly returning an error; this
should never happen. Allow the raw IP buffer sizes to be
controlled via sysctl.


# 39191c8e 13-Feb-1997 Garrett Wollman <wollman@FreeBSD.org>

Provide PRC_IFDOWN and PRC_IFUP support for IP. Now, when an interface
is administratively downed, all routes to that interface (including the
interface route itself) which are not static will be deleted. When
it comes back up, and addresses remaining will have their interface routes
re-added. This solves the problem where, for example, an Ethernet interface
is downed by traffic continues to flow by way of ARP entries.


# 82c39223 21-Jan-1997 Garrett Wollman <wollman@FreeBSD.org>

Count multicast packets received for groups of which we are not
a member separately from generic ``can't forward'' packets. This
would have helped me find the previous bug much faster.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# ca5d5e73 12-Nov-1996 Bruce Evans <bde@FreeBSD.org>

Forward-declare `struct inpcb' so that including this file doesn't cause
lots of warnings.

Should be in 2.2. Previous version shouldn't have been in 2.2.


# 82c23eba 10-Nov-1996 Bill Fenner <fenner@FreeBSD.org>

Add the IP_RECVIF socket option, which supplies a packet's incoming interface
using a sockaddr_dl.

Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well. (They previously only
worked for unicast UDP).


# 430d30d8 25-Oct-1996 Bill Fenner <fenner@FreeBSD.org>

Don't allow reassembly to create packets bigger than IP_MAXPACKET, and count
attempts to do so.
Don't allow users to source packets bigger than IP_MAXPACKET.
Make UDP length and ipovly's protocol length unsigned short.

Reviewed by: wollman
Submitted by: (partly by) kml@nas.nasa.gov (Kevin Lahey)


# 64682bc2 23-Oct-1996 Garrett Wollman <wollman@FreeBSD.org>

Give ip_len and ip_off more natural, unsigned types.


# 5e26bd9a 15-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Forward-declared `struct route' for the KERNEL case so that <net/route.h>
isn't a prerequisite.

Fixed style of ifdefs.


# 93e0e116 10-Jul-1996 Julian Elischer <julian@FreeBSD.org>

Adding changes to ipfw and the kernel to support ip packet diversion..
This stuff should not be too destructive if the IPDIVERT is not compiled in..
be aware that this changes the size of the ip_fw struct
so ipfw needs to be recompiled to use it.. more changes coming to clean this up.


# e62b8c49 26-Mar-1996 Bill Fenner <fenner@FreeBSD.org>

Make rip_input() take the header length
Move ipip_input() and rsvp_input() prototypes to ip_var.h
Remove unused prototype for rip_ip_input() from ip_var.h
Remove unused variable *opts from rip_output()


# 6c5e9bbd 30-Jan-1996 Mike Pritchard <mpp@FreeBSD.org>

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# f708ef1b 14-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Another mega commit to staticize things.


# b7a44e34 05-Dec-1995 Garrett Wollman <wollman@FreeBSD.org>

Path MTU Discovery is now standard.


# 0312fbe9 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

New style sysctl & staticize alot of stuff.


# 3271ad14 21-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Merge 4.4-Lite-2 by updating the version number.

Obtained from: 4.4BSD-Lite-2


# efe4b0eb 21-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Second try: get 4.4-Lite-2 into the source tree. The conflicts don't
matter because none of our working source files are on the CSRG branch
any more.

Obtained from: 4.4BSD-Lite-2


# 5cbf3e08 18-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Initial back-end support for IP MTU discovery, gated on MTUDISC. The support
for TCP has yet to be written.


# b124e4f2 26-Jul-1995 Garrett Wollman <wollman@FreeBSD.org>

Fix test for determining when RSVP is inactive in a router. (In this
case, multicast options are not passed to ip_mforward().) The previous
version had a wrong test, thus causing RSVP mrouters to forward RSVP messages
in violation of the spec.


# e9ce2e7d 27-Jun-1995 David Greenman <dg@FreeBSD.org>

Added function prototypes for ip_rsvp_vif_init, ip_rsvp_vif_done, and
ip_rsvp_force_done.


# 1c5de19a 13-Jun-1995 Garrett Wollman <wollman@FreeBSD.org>

Kernel side of 3.5 multicast routing code, based on work by Bill Fenner
and other work done here. The LKM support is probably broken, but it
still compiles and will be fixed later.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# d99c7a23 16-Mar-1995 Garrett Wollman <wollman@FreeBSD.org>

This set of patches enables IP multicasting to work under FreeBSD. I am
submitting them as context diffs for the following files:

sys/netinet/ip_mroute.c
sys/netinet/ip_var.h
sys/netinet/raw_ip.c
usr.sbin/mrouted/igmp.c
usr.sbin/mrouted/prune.c

The routine rip_ip_input in raw_ip.c is suggested by Mark Tinguely
(tinguely@plains.nodak.edu). I have been running mrouted with these patches
for over a week and nothing has seemed seriously wrong. It is being run in
two places on our network as a tunnel on one and a subnet querier on the
other. The only problem I have run into is that mrouted on the tunnel must
start up last or the pruning isn't done correctly and multicast packets
flood your subnets.

Submitted by: Soochon Radee <slr@mitre.org>


# c178994d 13-Feb-1995 Poul-Henning Kamp <phk@FreeBSD.org>

YPfix


# 5e9ae478 13-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Shuffle some functions and variables around to make it possible for
multicast routing to be implemented as an LKM. (There's still a bit of
work to do in this area.)


# f0068c4a 06-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Initial get-the-easy-case-working upgrade of the multicast code
to something more recent than the ancient 1.2 release contained in
4.4. This code has the following advantages as compared to
previous versions (culled from the README file for the SunOS release):

- True multicast delivery
- Configurable rate-limiting of forwarded multicast traffic on each
physical interface or tunnel, using a token-bucket limiter.
- Simplistic classification of packets for prioritized dropping.
- Administrative scoping of multicast address ranges.
- Faster detection of hosts leaving groups.
- Support for multicast traceroute (code not yet available).
- Support for RSVP, the Resource Reservation Protocol.

What still needs to be done:

- The multicast forwarder needs testing.
- The multicast routing daemon needs to be ported.
- Network interface drivers need to have the `#ifdef MULTICAST' goop ripped
out of them.
- The IGMP code should probably be bogon-tested.

Some notes about the porting process:

In some cases, the Berkeley people decided to incorporate functionality from
later releases of the multicast code, but then had to do things differently.
As a result, if you look at Deering's patches, and then look at
our code, it is not always obvious whether the patch even applies. Let
the reader beware.

I ran ip_mroute.c through several passes of `unifdef' to get rid of
useless grot, and to permanently enable the RSVP support, which we will
include as standard.

Ported by: Garrett Wollman
Submitted by: Steve Deering and Ajit Thyagarajan (among others)


# 707f139e 20-Aug-1994 Paul Richards <paul@FreeBSD.org>

Made idempotent.

Submitted by: Paul


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources