History log of /freebsd-current/sys/netinet/if_ether.c
Revision Date Author Comments
# fcdf9a19 13-Apr-2024 Denny Page <dennypage@me.com>

Support ARP for 802 networks

This is used by 802.3 Ethernet. (Also be used by 802.4 Token Bus and
802.5 Token Ring, but we don't support those.)

This was accidentally removed along with FDDI support in commit
0437c8e3b198, presumably because comments implied it was used only by
FDDI or Token Ring.

Fixes: 0437c8e3b198 ("Remove support for FDDI networks.")
Reviewed-by: emaste
Signed-off-by: Denny Page <dennypage@me.com>
Pull-request: https://github.com/freebsd/freebsd-src/pull/1166


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# d18b4bec 31-May-2022 Arseny Smalyuk <smalukav@gmail.com>

netinet6: Fix mbuf leak in NDP

Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after: 2 weeks


# c9a5c48a 27-May-2022 Konrad Sewiłło-Jopek <kjopek@gmail.com>

arp: Implement sticky ARP mode for interfaces.

Provide sticky ARP flag for network interface which marks it as the
"sticky" one similarly to what we have for bridges. Once interface is
marked sticky, any address resolved using the ARP will be saved as a
static one in the ARP table. Such functionality may be used to prevent
ARP spoofing or to decrease latencies in Ethernet networks.

The drawbacks include potential limitations in usage of ARP-based
load-balancers and high-availability solutions such as carp(4).

The implemented option is disabled by default, therefore should not
impact the default behaviour of the networking stack.

Sponsored by: Conclusive Engineering sp. z o.o.
Reviewed By: melifaro, pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D35314
MFC after: 2 weeks


# dd91d844 08-Apr-2022 Mark Johnston <markj@FreeBSD.org>

net: Fix LLE lock leaks

Historically, lltable_try_set_entry_addr() would release the LLE lock
upon failure. After some refactoring, it no longer does so, but
consumers were not adjusted accordingly.

Also fix a leak that can occur if lltable_calc_llheader() fails in the
ARP code, but I suspect that such a failure can only occur due to a code
bug.

Reviewed by: bz, melifaro
Reported by: pho
Fixes: 0b79b007ebfc ("[lltable] Restructure nd6 code.")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34831


# c8ee75f2 10-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Use network epoch to protect local IPv4 addresses hash.

The modification to the hash are already naturally locked by
in_control_sx. Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.

Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D32584


# 2144431c 08-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Remove in_ifaddr_lock acquisiton to access in_ifaddrhead.

An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.

Next step would be to CK-ify the in_addr hash and get rid of the...

Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D32434


# f5777c12 01-Sep-2021 orange30 <44566632+orange30@users.noreply.github.com>

net: Fix memory leaks upon arp_fillheader() failures

Free memory before return from arprequest_internal(). In in_arpinput(),
if arp_fillheader() fails, it should use goto drop.

Reviewed by: melifaro, imp, markj
MFC after: 1 week
Pull Request: https://github.com/freebsd/freebsd-src/pull/534


# 8482aa77 02-Aug-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use lltable calculated header when sending lle holdchain after successful lle resolution.

Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D31391


# f3a3b061 02-Aug-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

[lltable] Unify datapath feedback mechamism.

Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31390


# c139b3c1 22-Feb-2021 Kristof Provost <kp@FreeBSD.org>

arp/nd: Cope with late calls to iflladdr_event

When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.

In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.

Reviewed by: donner@, melifaro@
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D28860


# 0da3f8c9 11-Jan-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Bump amount of queued packets in for unresolved ARP/NDP entries to 16.

Currently default behaviour is to keep only 1 packet per unresolved entry.
Ability to queue more than one packet was added 10 years ago, in r215207,
though the default value was kep intact.

Things have changed since that time. Systems tend to initiate multiple
connections at once for a variety of reasons.
For example, recent kern/252278 bug report describe happy-eyeball DNS
behaviour sending multiple requests to the DNS server.

The primary driver for upper value for the queue length determination is
memory consumption. Remote actors should not be able to easily exhaust
local memory by sending packets to unresolved arp/ND entries.

For now, bump value to 16 packets, to match Darwin implementation.

The proper approach would be to switch the limit to calculate memory
consumption instead of packet count and limit based on memory.

We should MFC this with a variation of D22447.

Reviewers: #manpages, #network, bz, emaste

Reviewed By: emaste, gbe(doc), jilles(doc)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D28068


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 6ad7446c 02-Jul-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().

fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445


# 66bc03d4 02-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use interface fib for proxyarp checks.

Before the change, proxyarp checks for src and dst addresses
were performed using default fib, breaking multi-fib scenario.

PR: 245181
Submitted by: Scott Aitken (original version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24244


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 481be5de 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


# b8a6e03f 07-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111


# e2e050c8 19-May-2019 Conrad Meyer <cem@FreeBSD.org>

Extract eventfilter declarations to sys/_eventfilter.h

This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.


# b25d74e0 08-Mar-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

Improve ARP logging.

r344504 added an extra ARP_LOG() call in case of an if_output() failure.
It turns out IPv4 can be noisy. In order to not spam the console by default:
(a) add a counter for these events so people can keep better track of how
often it happens, and
(b) add a sysctl to select the default ARP_LOG log level and set it to
INFO avoiding the one (the new) DEBUG level by default.

Claim a spare (1st one after 10 years since the stats were added) in order
to not break netstat from FreeBSD 12->13 updates in the future.

Reviewed by: karels
Differential Revision: https://reviews.freebsd.org/D19490


# a4c69b8b 24-Feb-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

Make arp code return (more) errors.

arprequest() is a void function and in case of error we simply
return without any feedback. In case of any local operation
or *if_output() failing no feedback is send up the stack for the
packet which triggered the arp request to be sent.
arpresolve_full() has three pre-canned possible errors returned
(if we have not yet sent enough arp requests or if we tried
often enough without success) otherwise "no error" is returned.

Make arprequest() an "internal" function arprequest_internal() which
does return a possible error to the caller. Preserve arprequest()
as a void wrapper function for external consumers.
In arpresolve_full() add an extra error checking. Use the
arprequest_internal() function and only return an error if non
of the three ones (mentioend above) are already set.

This will return possible errors all the way up the stack and
allows functions and programs to react on the send errors rather
than leaving them in the dark. Also they might get more detailed
feedback of why packets cannot be sent and they will receive it
quicker.

Reviewed by: karels, hselasky
Differential Revision: https://reviews.freebsd.org/D18904


# 3838c6a3 12-Feb-2019 Kristof Provost <kp@FreeBSD.org>

garp: Fix vnet related panic for gratuitous arp

Gratuitous ARP packets are sent from a timer, which means we don't have a vnet
context set. As a result we panic trying to send the packet.

Set the vnet context based on the interface associated with the interface
address.

To reproduce:
sysctl net.link.ether.inet.garp_rexmit_count=2
ifconfig vtnet1 10.0.0.1/24 up

PR: 235699
Reviewed by: vangyzen@
MFC after: 1 week


# a68cc388 08-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin


# 945aad9c 17-Nov-2018 Bjoern A. Zeeb <bz@FreeBSD.org>

Improve the comment for arpresolve_full() in if_ether.c.
No functional changes.

MFC after: 6 weeks


# 90d99b65 17-Nov-2018 Bjoern A. Zeeb <bz@FreeBSD.org>

Retire arpresolve_addr(), which is not used anywhere, from if_ether.c.


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# 3a4fc8a8 13-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Remove support for the Arcnet protocol.

While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR: 182297
Reviewed by: jhibbits, vangyzen
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15057


# 0437c8e3 11-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Remove support for FDDI networks.

Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017


# 1435dcd9 17-Mar-2018 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.

Current arp/nd code relies on the feedback from the datapath indicating
that the entry is still used. This mechanism is incorporated into the
arpresolve()/nd6_resolve() routines. After the inpcb route cache
introduction, the packet path for the locally-originated packets changed,
passing cached lle pointer to the ether_output() directly. This resulted
in the arp/ndp entry expire each time exactly after the configured max_age
interval. During the small window between the ARP/NDP request and reply
from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
path used by route cache. Unify the notification code by using single
inlined function with the per-AF callbacks.

Reported by: sthaug at nethelp.no
Reviewed by: ae
MFC after: 2 weeks


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# ff21796d 09-Aug-2017 Oleg Bulyzhin <oleg@FreeBSD.org>

Fix comment typo.


# 71949810 10-Mar-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Fix the L2 address printed in the "arp: %s moved from %*D" message.

In the r292978 struct llentry was changed and the ll_addr field become
the pointer.

PR: 217667
MFC after: 1 week


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 8144690a 16-Feb-2017 Eric van Gyzen <vangyzen@FreeBSD.org>

Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel

inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by: glebius, emaste
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9625


# 2d9db0bc 01-Oct-2016 Eric van Gyzen <vangyzen@FreeBSD.org>

Add GARP retransmit capability

A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.

To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).

Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.

Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695


# ea17754c 21-Jul-2016 Mike Karels <karels@FreeBSD.org>

Fix per-connection L2 caching in fast path

r301217 re-added per-connection L2 caching from a previous change,
but it omitted caching in the fast path. Add it.

Reviewed By: gallatin
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D7239


# 484149de 03-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.

Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.

The change should be transparent for non-VIMAGE kernels.

Reviewed by: gnn (, hiren)
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6691


# 6d768226 02-Jun-2016 George V. Neville-Neil <gnn@FreeBSD.org>

This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by: Mike Karels
Differential Revision: https://reviews.freebsd.org/D6262


# 99d628d5 15-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

netinet: for pointers replace 0 with NULL.

These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by: ae. tuexen


# 6cdb1854 01-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove second EVENTHANDLER_REGISTER slipped in r292978.
Describe the reason of doing unconditional M_PREPEND in ether_output().


# 4fb3a820 30-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102


# ddb13598 25-Dec-2015 Kevin Lo <kevlo@FreeBSD.org>

Fix typo (s/harware/hardware/)


# d6e82913 17-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay


# 3a909afe 16-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Fix issues introduced by r292275

* Fix panic for etherswitches which don't have a LLADDR.
* Disabled DELAY in unsolicited NDA, which needs further work.
* Fixed missing DELAY in carp_send_na.
* style(9) fix.

Reported by: kp & melifaro
X-MFC-With: r292275
MFC after: 1 month
Sponsored by: Multiplay


# 942e4b4b 16-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix ARP reply handling changed in r286955.

If source of ARP request didn't pass the routing check
(e.g. not in directly connected network), be polite and
still answer the request instead of dropping frame.

Reported by: quadro at irc@rusnet


# 52e53e2d 15-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111


# 9977be4a 09-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
return IPv6 destination address with embedded scope. Currently
rw_gateway has it scope embedded, do the same for non-gatewayed
destinations.

Sponsored by: Yandex LLC


# f8aee88f 05-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove LLE read lock from IPv4 fast path.

LLE structure is mostly unchanged during its lifecycle.
To be more specific, there are 2 things relevant for fast path
lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we re-send arp request to perform reachability verification instead of
expiring entry. The only signal that is needed from fast path is something
like binary yes/no.

The latter is solved by the following changes:
1) introduce special r_skip_req field which is read lockless by fast path,
but updated under (new) req_mutex mutex. If this field is non-zero, then
fast path will acquire lock and set it back to 0.
2) introduce simple state machine: incomplete->reachable<->verify->deleted.
Before that we implicitely had incomplete->reachable->deleted state machine,
with V_arpt_keep between "reachable" and "deleted". Verification was performed
in runtime 5 seconds before V_arpt_keep expire.
This is changed to "change state to verify 5 seconds before V_arpt_keep,
set r_skip_req to non-zero value and check it every second". If the value
is zero - then send arp verification probe.
These changes do not introduce any signifficant control plane overhead:
typically lle callout timer would fire 1 time more each V_arpt_keep (1200s)
for used lles and up to arp_maxtries (5) for dead lles.

As a result, all packets towards "reachable" lle are handled by fast path without
acquiring lle read lock.

Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler
might keep LLE lock for signifficant amount of time, which might not be feasible
for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock).

Differential Revision: https://reviews.freebsd.org/D3688


# 1c302b58 09-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Decompose arp_ifinit() into arp_add_ifa_lle() and arp_announce_ifaddr().
Rename arp_ifinit2() into arp_announce_ifaddr().

Eliminate zeroing ifa_rtrequest: it was used for calling arp_rtrequest()
which was responsible for handling route cloning requests. It became
obsolete since r186119 (L2/L3 split).


# b13c5b5d 09-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use lladdr_event to propagate gratiotus arp.

Differential Revision: https://reviews.freebsd.org/D4019


# ddd208f7 07-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Unify setting lladdr for AF_INET[6].


# 89bc0426 07-Oct-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Fix regression from r287779, that bite me. If we call m_pullup()
unconditionally, we end up with an mbuf chain of two mbufs, which
later in in_arpreply() is rewritten from ARP request to ARP reply
and is sent out. Looks like igb(4) (at least mine, and at least
at my network) fails on such mbuf chain, so ARP reply doesn't go
out wire. Thus, make the m_pullup() call conditional, as it is
everywhere. Of course, the bug in igb(?) should be investigated,
but better first fix the head. And unconditional m_pullup() was
suboptimal, anyway.


# eec33ea0 15-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Improve logging invalid arp messages
* Remove redundant check in ip_arpinput

Suggested by: glebius
MFC after: 2 weeks


# d3cdb716 15-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Require explicitl lle unlink prior to calling llentry_delete().
This one slightly decreases time of holding afdata wlock.
* While here, make nd6_free() return void. No one has used its return value
since r186119.


# 3e7a2321 14-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3573


# deb6bda6 14-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Improve error checking for arp messages.
* Clean stale headers from if_ether.c.

Reported by: rozhuk.im at gmail.com
Reviewed by: ae
MFC after: 2 weeks


# 5a255516 19-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
return existing if found, create if not. This behaviour was error-prone
since we had to deal with 'sudden' static<>dynamic lle changes.
This commit fixes bunch of different issues like:
- refcount leak when lle is converted to static.
Simple check case:
console 1:
while true;
do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
done;
done
console 2:
ping -f any-dead-host-in-L2
console 3:
# watch for memory consumption:
vmstat -m | awk '$1~/lltable/{print$2}'
- possible problems in arptimer() / nd6_timer() when dropping/reacquiring
lock.
New logic explicitly handles use-or-create cases in every lla_create
user. Basically, most of the changes are purely mechanical. However,
we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
lle_free_t callback for entry deletion


# a4141c63 19-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Check value return from lle_create() for NULL.
This bug sneaked unnoticed in r286722.

Reported by: adrian


# 0c4210f9 18-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix panic when handling non-inet arp message introduced in r286825.

Submitted by: delphij


# 512e30ef 15-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Split arpresolve() into fast/slow path.

This change isolates the most common case (e.g. successful lookup)
from more complicates scenarios. It also (tries to) make code
more simple by avoiding retry: cycle.

The actual goal is to prepare code to the upcoming change that will
allow LL address retrieval without acquiring LLE lock at all.

Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D3383


# f3bfa7d1 13-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Move lle update code from from gigantic ip_arpinput() to
separate bunch of functions. The goal is to isolate actual lle
updates to permit more fine-grained locking.

Do all lle link-level update under AFDATA wlock.

Sponsored by: Yandex LLC


# 0447c136 10-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.


# 314294de 11-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by: Yandex LLC


# cc0a3c8c 29-Jul-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.

Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149


# f0516b3c 16-Jun-2015 Ermal Luçi <eri@FreeBSD.org>

If there is a system with a bpf consumer running and a packet is wanted
to be transmitted but the arp cache entry expired, which triggers an arp request
to be sent, the bpf code might want to sleep but crash the system due
to a non sleep lock held from the arp entry not released properly.

Release the lock before calling the arp request code to solve the issue
as is done on all the other code paths.

PR: 200323
Approved by: ae, gnn(mentor)
MFC after: 1 week
Sponsored by: Netgate
Differential Revision: https://reviews.freebsd.org/D2828


# 1b5aa92c 07-Mar-2015 Andrey V. Elsukov <ae@FreeBSD.org>

lla_lookup() can directly call llentry_free() for static entries
and the last one requires to hold afdata's wlock.

PR: 197096
MFC after: 1 week


# 2575fbb8 09-Feb-2015 Randall Stewart <rrs@FreeBSD.org>

This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by: adrian, jhb and bz
Sponsored by: Netflix Inc.


# d63e657c 08-Jan-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4:
handle it in arc_output() instead of nd6_storelladdr().
* Remove IFT_ARCNET check from arpresolve() since arc_output() does not
use arpresolve() to handle broadcast/multicast. This check was there
since r84931. It looks like it was not used since r89099 (initial
import of Arcnet support where multicast is handled separately).
* Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output()
calles nd6_storelladdr() for unicast addresses only.
* Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now
handles multicast by itself.

As a result, we have the following pattern: all non-ethernet-style
media have their own multicast map handling inside their appropriate
routines. On the other hand, arpresolve() (and nd6_storelladdr()) which
meant to be 'generic' ones de-facto handles ethernet-only multicast maps.

MFC after: 3 weeks


# ed6a66ca 05-Jan-2015 Robert Watson <rwatson@FreeBSD.org>

To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision: https://reviews.freebsd.org/D1436
Reviewed by: glebius, trasz, gnn, bz
Sponsored by: EMC / Isilon Storage Division


# 20dd8995 03-Jan-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Hide lltable implementation details in if_llatbl_var.h
* Make most of lltable_* methods 'normal' functions instead of inline
* Add lltable_get_<af|ifp>() functions to access given lltable fields
* Temporarily resurrect nd6_lookup() function


# ee7e9a4e 08-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Do not assume lle has sockaddr key after struct lle:
use llt_fill_sa_entry() llt method to store lle address in sa.
* Eliminate L3_ADDR macro and either reference IPv4/IPv6 address
directly from lle or use newly-created llt_fill_sa_entry().
* Do not store sockaddr inside arp/ndp lle anymore.


# d82ed505 08-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify lle lookup/create api by using addresses instead of sockaddrs.


# 73b52ad8 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use llt_prepare_static_entry method to prepare valid per-af static entry.


# 0368226e 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Retire abstract llentry_free() in favor of lltable_drop_entry_queue()
and explicit calls to RTENTRY_FREE_LOCKED()
* Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6.
* Rename <lltable_|llt>_delete function to _delete_addr() to note that
this function is used to external callers. Make this function maintain
its own locking.
* Use lookup/unlink/clear call chain from internal callers instead of
delete_addr.
* Fix LLE_DELETED flag handling


# 721cd2e0 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do not enforce particular lle storage scheme:
* move lltable allocation to per-domain callbacks.
* make llentry_link/unlink functions overridable llt methods.
* make hash table traversal another overridable llt method.


# a743ccd4 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Add llt_clear_entry() callback which is able to do all lle
cleanup including unlinking/freeing
* Relax locking in lltable_prefix_free_af/lltable_free
* Do not pass @llt to lle free callback: it is always NULL now.
* Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding
unlock/lock sequinces
* Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc()
to retrieve preferred source address from lle hold queue and pass it
instead of lle.
* Finally, make nd6_create() create and return unlocked lle
* Separate defrtr handling code from nd6_free():
use nd6_check_del_defrtr() to check if we need to keep entry instead of
performing GC,
use nd6_check_recalc_defrtr() to perform actual recalc on lle removal.
* Move isRouter handling from nd6_cache_lladdr() to separate
nd6_check_router()
* Add initial code to maintain lle runtime flags in sync.


# a6e934e3 01-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Put bandaid on arptimer-derived crashed for 'arp -da' deleted records.
The proper fix will arrive after convering lltable 'delete' method.


# 9b65db85 01-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do more fine-grained locking in lltable code: lltable_create_lle()
does actual new lle creation without extensive locking and existing
lle search.
Move lle updating code from gigantic in_arpinput() to arp_update_llle()
and some other functions.

IPv6 changes to follow.


# ce313fdd 30-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Unify lle table dump/prefix removal code.
* Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes
used by lltable code.


# 5d14e4cd 29-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Provide rte_<get|set> methods to access rtentry for external consumers.


# 74860d4f 27-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr -
return lle flags IFF needed.
Do not pass rte to arpresolve - pass is_gateway flag instead.


# 73d77028 23-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do more fine-grained lltable locking: use table runtime lock as rare
as we can.


# 7c066c18 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
(IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.


# 27688dfe 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Temporarily revert r274774.


# 2e47d2f9 21-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Mark ifaddr/rtsock static entries RLLE_VALID.


# 86b94cff 21-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Finish r274774: add more headers/fix build for non-debug case.


# 9883e41b 20-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Switch IF_AFDATA lock to rmlock


# df629abf 16-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Rework LLE code locking:
* struct llentry is now basically split into 2 pieces:
all fields within 64 bytes (amd64) are now protected by both
ifdata lock AND lle lock, e.g. you require both locks to be held
exclusively for modification. All data necessary for fast path
operations is kept here. Some fields were added:
- r_l3addr - makes lookup key liev within first 64 bytes.
- r_flags - flags, containing pre-compiled decision whether given
lle contains usable data or not. Current the only flag is RLLE_VALID.
- r_len - prepend data len, currently unused
- r_kick - used to provide feedback to control plane (see below).
All other fields are protected by lle lock.
* Add simple state machine for ARP to handle "about to expire" case:
Current model (for the fast path) is the following:
- rlock afdata
- find / rlock rte
- runlock afdata
- see if "expire time" is approaching
(time_uptime + la->la_preempt > la->la_expire)
- if true, call arprequest() and decrease la_preempt
- store MAC and runlock rte
New model (data plane):
- rlock afdata
- find rte
- check if it can be used using r_* fields only
- if true, store MAC
- if r_kick field != 0 set it to 0.
- runlock afdata
New mode (control plane):
- schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries)
seconds instead of V_arpt_keep.
- on first timer invocation change state from ARP_LLINFO_REACHABLE
to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in
V_arpt_rexmit (default to 1 sec).
- on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks
for r_kick value: reschedule if not changed, and send arprequest()
if set to zero (e.g. entry was used).
* Convert IPv4 path to use new single-lock approach. IPv6 bits to follow.
* Slow down in_arpinput(): now valid reply will (in most cases) require
acquiring afdata WLOCK twice. This is requirement for storing changed
lle data. This change will be slightly optimized in future.
* Provide explicit hash link/unlink functions for both ipv4/ipv6 code.
This will probably be moved to generic lle code once we have per-AF
hashing callback inside lltable.
* Perform lle unlink on deletion immediately instead of delaying it to
the timer routine.
* Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the
presence of lle reference used for safe callout calls.


# b9497d2c 15-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Move "slow path" handling from arpresolve() to newly-created
arpresolve_slow().


# b4b1367a 15-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().


# 6df8a710 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


# 8c3cfe0b 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.


# 9f65116c 25-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Increase nh_flags to be u16 thus reducing nhop payload to be 48 bytes
* Use NHF_ namespace for all nhop flags
* Rename nhop_data -> nhop_prepend
* Rename fib4_lookup_nh_extended -> fib4_lookup_nh_ext
* Add "flags" argument to fib4_lookup_nh_ext() to specify whether we want
returned nh_ext structure to be refcounted or not.


# 0652a307 24-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert arpinput() to use new routing api.


# b4e8f808 19-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Switch IPv4 output path to use new routing api.

The goals of the new API is to provide consumers with minimal
needed information, but as fast as possible. So we provide
full nexthop info copied into alighed on-cache structure
instead of rte/ia pointers, their refcounts and locks.
This does not provide solution for protecting from egress
ifp destruction, but does not make it any worse.

Current changes:

nhops:
Add fib4_lookup_prepend() function which stores either full
L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or
L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2
info exists and we have to take "slow" path.

ip_output:
Currently ip[ 46]_output consumers use 'struct route' for
the following purposes:
1) double lookup avoidance(route caching)
2) plain route caching
3) get path MTU to be able to notify source.
The former pattern is mostly used by various tunnels
(gif, gre, stf). (Actually, gre is the only remaining,
others were already converted. Their locking model did
not scale good enogh to benefit from such caching, so
we have (temporarily) removed it without any performance
loss).
Plain route caching used by SCTP is simply wrong and should be removed.
Temporary break it for now just to be able to compile.
Optimize path mtu reporting by providing it in new 'route_info' stucture.

Minimize games with @ia locking/refcounting for route lookup:
add special nhop[46]_extended structure to store more route attributes.
Pointer to given structure can be passed to fib4_lookup_prepend() to indicate
we want this info (we actually needs it for UDP and raw IP).

ether_output:
Provide light-weight ether_output2() call to deal with
transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/
other L2 hooks before actually transmitting frame by if_transmit()).
Add a hack based on new RT_NHOP ro_flag to distinguish which version should
we call. Better way is probably to add a new "if_output_frame" driver
callbacks.

Next steps:
* Convert ip_fastfwd part
* Implement auto-growing array for per-radix nexthops
* Implement LLE tracking for nexthop calculations to be able to
immediately provide all necessary info in single route lookup
for gateway routes
* Switch radix locking scheme to runtime/cfg lock
* Implement multipath support for rtsock
* Implement "tracked nexthops" for tunnels (e.g. _proper_
nexthop caching)
* Add IPv6 support for remaining parts (postponed not to
interfere with user/ae/inet6 branch)
* Consider adding "if_output_frame" driver call to
ease logical frame pushing.


# 546451a2 31-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Use macros instead of referencing struct if_data that resides in ifnet.

Sponsored by: Nginx, Inc.


# 743c072a 26-Mar-2014 Alan Somers <asomers@FreeBSD.org>

Correct ARP update handling when the routes for network interfaces are
restricted to a single FIB in a multifib system.

Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.

sys/netinet/in.c:
When dealing with RTM_ADD (add route) requests for an interface, use
the interface's assigned FIB instead of the default (FIB 0).

sys/netinet/if_ether.c:
In arpresolve(), enhance error message generated when an
lla_lookup() fails so that the interface causing the error is
visible in logs.

tests/sys/netinet/fibs_test.sh
Clear ATF expected error.

PR: kern/167947
Submitted by: Nikolay Denev <ndenev@gmail.com> (previous version)
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation


# ea0c3776 02-Jan-2014 Andrey V. Elsukov <ae@FreeBSD.org>

lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

Reviewed by: glebius, adrian
MFC after: 1 week
Sponsored by: Yandex LLC


# b1b9dcae 05-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.


# 237bf7f7 01-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Cleanup in_ifscrub(), which is just an entry to in_scrubprefix().


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 86bd0491 19-Aug-2013 Andre Oppermann <andre@FreeBSD.org>

Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with: trociny, glebius


# 5b7cb97c 09-Jul-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.


# e364d8c4 03-Jul-2013 Navdeep Parhar <np@FreeBSD.org>

Catch up with r238990. LLE_DELETED does not clobber everything else in
la_flags since said revision.


# 5d81d095 11-May-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Rate limit the number of remotely triggered ARP log messages
to 1 log message per second.


# 47e8d432 25-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Add const qualifier to the dst parameter of the ifnet if_output method.


# 414676ba 25-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Fix couple of mbuf leaks in incoming ARP processing.


# b1ec2940 13-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by: Ian FREISLICH <ianf clue.co.za>


# eb1b1807 05-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 478df1d5 03-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Provide a sysctl switch that allows to install ARP entries
with multicast bit set. FreeBSD refuses to install such
entries since 9.0, and this broke installations running
Microsoft NLB, which are violating standards.

Tested by: Tarasov Oleg <oleg_tarasov sg-tea.com>


# ea537929 02-Aug-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>


# b9aee262 01-Aug-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Some more whitespace cleanup.


# ea50c13e 31-Jul-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Some style(9) and whitespace changes.

Together with: Andrey Zonov <andrey zonov.org>


# 09fe6320 19-Jun-2012 Navdeep Parhar <np@FreeBSD.org>

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


# 83e521ec 21-Jan-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Clean up some #endif comments removing from short sections. Add #endif
comments to longer, also refining strange ones.

Properly use #ifdef rather than #if defined() where possible. Four
#if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to
avoid conflicts with eventually upcoming changes for RSS.

Reported by: bde (most)
Reviewed by: bde
MFC after: 3 days


# 44e48969 20-Jan-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove a superfluous INET6 check (no opt_inet6.h included anyway).

MFC after: 3 days


# aa24ae3c 08-Jan-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Make it possible to use alternative source hardware address
in the ARP datagram generated by arprequest(). If caller doesn't
supply the address, then it is either picked from CARP or hardware
address of the interface is taken.

While here, make several minor fixes:

- Hold IF_ADDR_RLOCK(ifp) while traversing address list.
- Remove not true comment.
- Access internet address and mask via in_ifaddr fields,
rather than ifaddr.


# 94901d5e 08-Jan-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Move arprequest() declaration to if_ether.h.


# 137f91e8 05-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks


# 9de96e89 29-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Don't fallback to a CARP address in BACKUP state.


# 08b68b0e 15-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


# 61905171 21-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Be more informative for "unknown hardware address format" message.

Submitted by: Andrzej Tobola <ato iem.pw.edu.pl>


# c9168718 20-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

- Reduce severity for all ARP events, that can be triggered from remote
machine to LOG_NOTICE. Exception left to "using my IP address".
- Fix multicast ARP warning: add newline and also log the bad MAC address.

Tested by: Alexander Wittig <wittigal msu.edu>


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 4659e09d 07-Jul-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Add again the checking for log_arp_permanent_modify that was by accident
removed in the r186119.

PR: kern/154831
MFC after: 1 week


# 2303570f 03-Jul-2011 Andrey V. Elsukov <ae@FreeBSD.org>

ARP code reuses mbuf from ARP request to make a reply, but it does not
reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP
replies were received on specified interface.
Reset rcvif to NULL for ARP replies to fix this issue.

PR: kern/131817
Reviewed by: glebius
MFC after: 1 month


# f4048639 18-Jun-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove a these days incorrect comment left from before new-arp.

MFC after: 1 week


# e4cd31dd 21-Mar-2011 Jeff Roberson <jeff@FreeBSD.org>

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 6bccea7c 21-Feb-2011 Rebecca Cran <brucec@FreeBSD.org>

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


# 96561547 25-Jan-2011 Andrew Thompson <thompsa@FreeBSD.org>

When matching an incoming ARP against a bridge, ensure both interfaces belong
to the same bridge.

Submitted by: Alexander Zagrebin


# 9844b029 12-Jan-2011 Christian S.J. Peron <csjp@FreeBSD.org>

Un-break the build: use the correct format specifier for sizeof()


# 09d3f895 12-Jan-2011 George V. Neville-Neil <gnn@FreeBSD.org>

Fix several bugs in the ARP code related to improperly formatted
packets.

*) Reject requests with a protocol length not equal to 4. This is IPv4
and there is no reason to accept anything else.

*) Reject packets that have a multicast source hardware address.

*) Drop requests where the hardware address length is not equal
to the hardware address length of the interface.

Pointed out by: Rozhuk Ivan
MFC after: 1 week


# ede99017 07-Jan-2011 George V. Neville-Neil <gnn@FreeBSD.org>

Fix a memory leak in ARP queues.

Pointed out by: jhb@
MFC after: 2 weeks


# 90fdff07 07-Jan-2011 George V. Neville-Neil <gnn@FreeBSD.org>

Adjust ARP hold queue locking.

Submitted by: Rozhuk Ivan, jhb
MFC after: 2 weeks


# a98c06f1 30-Nov-2010 Gleb Smirnoff <glebius@FreeBSD.org>

Use time_uptime instead of non-monotonic time_second to drive ARP
timeouts.

Suggested by: bde


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 05939839 22-Nov-2010 Marko Zec <zec@FreeBSD.org>

Remove an apparently redundant CURVNET_SET() / CURVNET_RESTORE() pair.

MFC after: 3 days


# 31c6a003 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# e162ea60 12-Nov-2010 George V. Neville-Neil <gnn@FreeBSD.org>

Add a queue to hold packets while we await an ARP reply.

When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by: bz, rpaulo
MFC after: 3 weeks


# 33b31db6 02-Nov-2010 John Baldwin <jhb@FreeBSD.org>

Don't leak the LLE lock if the arptimer callout is pending or inactive.

Reported by: David Rhodus
MFC after: 1 month


# 27bf126d 29-Oct-2010 Gleb Smirnoff <glebius@FreeBSD.org>

Remove meaningless XXXXX, that is a remain of comment, removed in r186200.


# 28e1f17c 29-Oct-2010 Gleb Smirnoff <glebius@FreeBSD.org>

Revert a small part of the r198301, that is entirely unrelated to the
r198301 itself. It also broke the logic of not sending more than one
ARP request per second, that consequently lead to a potential problem
of flooding network with broadcast packets.

MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 9963e8a5 11-Aug-2010 Will Andrews <will@FreeBSD.org>

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


# 54bfbd51 10-Aug-2010 Will Andrews <will@FreeBSD.org>

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# 19291ab3 31-Jul-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer. It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by: rwatson
MFC after: 6 days


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# feb3a5f7 21-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r206481:

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing the entire tables make sure that in case we
cancel a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


# becba438 11-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


# fbbbfe0b 28-Jan-2010 George V. Neville-Neil <gnn@FreeBSD.org>

MFC r196797:

Add ARP statistics to the kernel and netstat.


# cbbe4a75 21-Jan-2010 Navdeep Parhar <np@FreeBSD.org>

MFC r201416:

Avoid NULL dereference in arpresolve.

Requested by: kib@


# 4cc5ccf3 11-Jan-2010 Qing Li <qingli@FreeBSD.org>

MFC r201544

An existing incomplete ARP entry would expire a subsequent
statically configured entry of the same host. This bug was
due to the expiration timer was not cancelled when installing
the static entry. Since there exist a potential race condition
with respect to timer cancellation, simply check for the
LLE_STATIC bit inside the expiration function instead of
cancelling the active timer.


# ee8a75d3 04-Jan-2010 Qing Li <qingli@FreeBSD.org>

An existing incomplete ARP entry would expire a subsequent
statically configured entry of the same host. This bug was
due to the expiration timer was not cancelled when installing
the static entry. Since there exist a potential race condition
with respect to timer cancellation, simply check for the
LLE_STATIC bit inside the expiration function instead of
cancelling the active timer.

MFC after: 5 days


# 56714599 02-Jan-2010 Navdeep Parhar <np@FreeBSD.org>

Avoid NULL dereference in arpresolve.


# f60909e3 28-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC r198418

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru


# 6cb2b4e7 23-Oct-2009 Qing Li <qingli@FreeBSD.org>

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru
MFC after: 3 days


# 0eb4d28b 20-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC 198301

In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by: discussed with kmacy
Approved by: re
Verified by: rnoland, yongari


# fc023235 20-Oct-2009 Qing Li <qingli@FreeBSD.org>

In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by: discussed with kmacy
MFC after: immediately
Verified by: rnoland, yongari


# 6f99a646 20-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC r198111

This patch fixes the following issues in the ARP operation:

1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.
4. The return error code was incorrect.

Reviewed by: kmacy
Approved by: re


# 93704ac5 15-Oct-2009 Qing Li <qingli@FreeBSD.org>

This patch fixes the following issues in the ARP operation:

1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.

Reviewed by: kmacy
MFC after: 3 days


# bb3b75e8 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r197225

This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by: bz
Approved by: re


# cd29a779 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by: bz
MFC after: immediately


# 9a311445 08-Sep-2009 Navdeep Parhar <np@FreeBSD.org>

Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by: gnn, kmacy
Approved by: gnn (mentor)
MFC after: 1 month


# 54fc657d 03-Sep-2009 George V. Neville-Neil <gnn@FreeBSD.org>

Add ARP statistics to the kernel and netstat.

New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after: 2 weeks
In collaboration with: bz


# 5b628e0c 02-Sep-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r196738:
In case an upper layer protocol tries to send a packet but the
L2 code does not have the ethernet address for the destination
within the broadcast domain in the table, we remember the
original mbuf in `la_hold' in arpresolve() and send out a
different packet with an arp request.
In case there will be more upper layer packets to send we will
free an earlier one held in `la_hold' and queue the new one.

Once we get a packet in, with which we can perfect our arp table
entry we send out the original 'on hold' packet, should there
be any.
Rather than continuing to process the packet that we received,
we returned without freeing the packet that came in, which
basically means that we leaked an mbuf for every arp request
we sent.

Rather than freeing the received packet and returning, continue
to process the incoming arp packet as well.
This should (a) improve some setups, also proxy-arp, in case it was an
incoming arp request and (b) resembles the behaviour FreeBSD had
from day 1, which alignes with RFC826 "Packet reception" (merge case).

Rename 'm0' to 'hold' to make the code more understandable as
well as diffable to earlier versions more easily.

Handle the link-layer entry 'la' lock comepletely in the block
where needed and release it as early as possible, rather than
holding it longer, down to the end of the function.

Found by: pointyhat, ns1
Bug hunting session with: erwin, simon, rwatson
Tested by: simon on cluster machines
Reviewed by: ratson, kmacy, julian

Approved by: re (kib)


# cc7e9d43 01-Sep-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

In case an upper layer protocol tries to send a packet but the
L2 code does not have the ethernet address for the destination
within the broadcast domain in the table, we remember the
original mbuf in `la_hold' in arpresolve() and send out a
different packet with an arp request.
In case there will be more upper layer packets to send we will
free an earlier one held in `la_hold' and queue the new one.

Once we get a packet in, with which we can perfect our arp table
entry we send out the original 'on hold' packet, should there
be any.
Rather than continuing to process the packet that we received,
we returned without freeing the packet that came in, which
basically means that we leaked an mbuf for every arp request
we sent.

Rather than freeing the received packet and returning, continue
to process the incoming arp packet as well.
This should (a) improve some setups, also proxy-arp, in case it was an
incoming arp request and (b) resembles the behaviour FreeBSD had
from day 1, which alignes with RFC826 "Packet reception" (merge case).

Rename 'm0' to 'hold' to make the code more understandable as
well as diffable to earlier versions more easily.

Handle the link-layer entry 'la' lock comepletely in the block
where needed and release it as early as possible, rather than
holding it longer, down to the end of the function.

Found by: pointyhat, ns1
Bug hunting session with: erwin, simon, rwatson
Tested by: simon on cluster machines
Reviewed by: ratson, kmacy, julian
MFC after: 3 days


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# df813b7e 27-Jul-2009 Qing Li <qingli@FreeBSD.org>

This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 2d9cfaba 25-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


# f8574c7a 24-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add missing unlock of if_addr_mtx when an unmatched ARP packet is received.

Reported by: lstewart
MFC after: 6 weeks


# 09d54778 24-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

In ARP input, more consistently acquire and release ifaddr references.

MFC after: 6 weeks


# 5736e6fb 23-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.


# 8d8bc018 08-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# d4b5cae4 01-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
workstream, or set of per-protocol queues. Threads may be bound
to specific CPUs, or allowed to migrate, based on a global policy.

In the future it would be desirable to support topology-centric
policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
currently be one of:

NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
an implicit or explicit source (such as an interface or socket).

NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
as well as allowing protocols to provide a flow generation function
for mbufs without flow identifers (m2flow). Falls back on
NETISR_POLICY_SOURCE if now flow ID is available.

NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
being used, as well as a mapping function from workstream to CPU ID,
which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
get and clear drop counters, which query data or apply changes across
all workstreams.

- Add a more extensible netisr registration interface, in which
protocols declare 'struct netisr_handler' structures for each
registered NETISR_ type. These include name, handler function,
optional mbuf to flow ID function, optional mbuf to CPU ID function,
queue limit, and ordering policy. Padding is present to allow these
to be expanded in the future. If no queue limit is declared, then
a default is used.

- Queue limits are now per-workstream, and raised from the previous
IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
with the exception of netnatm, default queue limits. Most protocols
register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
NETISR_POLICY_FLOW, and will therefore take advantage of driver-
generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
the netisr, rather than having polling pretend to be two protocols.
Provide two explicit hooks in the netisr worker for start and end
events for runs: netisr_poll() and netisr_pollmore(), as well as a
function, netisr_sched_poll(), to allow the polling code to schedule
netisr execution. DEVICE_POLLING still embeds single-netisr
assumptions in its implementation, so for now if it is compiled into
the kernel, a single and un-bound netisr thread is enforced
regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking. It will be enabled once further rmlock
optimization has taken place. However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by: bz


# 21ca7b57 05-May-2009 Marko Zec <zec@FreeBSD.org>

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


# 279aa3d4 16-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


# 582b6122 15-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

make LLTABLE visible to netinet


# bfe1aba4 10-Apr-2009 Marko Zec <zec@FreeBSD.org>

Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized. In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw). Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module. The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on. Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first. The framework will panic if it detects any unresolved
dependencies before completing system initialization. Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run. In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container. IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by: bz
Approved by: julian (mentor)


# 1ed81b73 06-Apr-2009 Marko Zec <zec@FreeBSD.org>

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


# d10910e6 09-Mar-2009 Bruce M Simpson <bms@FreeBSD.org>

Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.


# 33553d6e 27-Feb-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


# 5e96c0a1 23-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

Fix missed unlock and reference drop of lle

Found by: pho


# ce9122fd 22-Dec-2008 Qing Li <qingli@FreeBSD.org>

Don't create a bogus ARP entry for 0.0.0.0.


# 897d75c9 19-Dec-2008 Qing Li <qingli@FreeBSD.org>

The proxy-arp code was broken and responds to ARP
requests for addresses that are not proxied locally.


# 00a46b31 16-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

default to doing lla_lookup with shared afdata lock and returning a
shared lock on the lle - thus restoring parallel performance to
pre-arpv2 level


# 86cd829d 15-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

don't unlock lle if it is NULL


# 6e6b3f7c 14-Dec-2008 Qing Li <qingli@FreeBSD.org>

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 4e57bc33 06-Dec-2008 Christian S.J. Peron <csjp@FreeBSD.org>

in_rtalloc1(9) returns a locked route, so make sure that we use
RTFREE_LOCKED() here. This macro makes sure the reference count
on the route is being managed properly. This elimates another
case which results in the following message being printed to the
console:

rtfree: 0xc841ee88 has 1 refs

Reviewed by: bz
MFC after: 2 weeks


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 97021c24 26-Nov-2008 Marko Zec <zec@FreeBSD.org>

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 44e33a07 19-Nov-2008 Marko Zec <zec@FreeBSD.org>

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 3ff0b213 15-Oct-2008 Marko Zec <zec@FreeBSD.org>

Remove a useless global static variable.

Approved by: bz (ad-hoc mentor)


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# de34ad3f 14-Sep-2008 Julian Elischer <julian@FreeBSD.org>

oops commit the version that compiles


# 93fcb5a2 14-Sep-2008 Julian Elischer <julian@FreeBSD.org>

Revert a part of the MRT commit that proved un-needed.
rt_check() in its original form proved to be sufficient and
rt_check_fib() can go away (as can its evil twin in_rt_check()).

I believe this does NOT address the crashes people have been seeing
in rt_check.

MFC after: 1 week


# 57a5a46e 04-Sep-2008 Giorgos Keramidas <keramida@FreeBSD.org>

Slightly reword comment and remove typos.


# 22b55ba9 31-Aug-2008 Julian Elischer <julian@FreeBSD.org>

fix tiny nti in comment


# 80b11ee4 18-Aug-2008 Philip Paeps <philip@FreeBSD.org>

Fix ARP in bridging scenarios where the bridge shares its
MAC address with one of its members (see my r180140).

Pointy hat to: philip
Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
MFC after: 3 days


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 59dd72d0 03-Jul-2008 Robert Watson <rwatson@FreeBSD.org>

Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred). Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.

Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired. This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.

It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.

Reviewed by: bz
MFC after: 1 month
X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
but the rest can go back.


# 8b07e49a 09-May-2008 Julian Elischer <julian@FreeBSD.org>

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# b6ae6984 31-Dec-2007 Julian Elischer <julian@FreeBSD.org>

Don't duplicate the whole of arpresolve to arpresolve 2 for the sake
of two compares against 0. The negative effect of cache flushing
is probably more than the gain by not doing the two compares (the
value is almost certainly in register or at worst, cache).
Note that the uses of m_freem() are in error cases and m_freem()
handles NULL anyhow. So fast-path really isn't changed much at all.


# 29910a5a 17-Dec-2007 Kip Macy <kmacy@FreeBSD.org>

widen the routing event interface (arp update, redirect, and eventually pmtu change)
into separate functions

revert previous commit's changes to arpresolve and add a new interface
arpresolve2 which does arp resolution without an mbuf


# 58505389 16-Dec-2007 Kip Macy <kmacy@FreeBSD.org>

Don't panic in arpresolve if we're given a null mbuf. We could
insist that the caller just pass in an initialized mbuf even
if didn't have any data - but that seems rather contrived.


# b3e761e5 15-Dec-2007 Kip Macy <kmacy@FreeBSD.org>

Move arp update upcall to always be called for ARP replies - previous invocation
would not always get called at the appropriate times


# 8e7e854c 12-Dec-2007 Kip Macy <kmacy@FreeBSD.org>

add interface for allowing consumers to register for ARP updates,
redirects, and path MTU changes

Reviewed by: silby


# 3affb6fb 04-Dec-2007 Yaroslav Tykhiy <ytykhiy@gmail.com>

For the sake of convenience, print the name of the network interface
IPv4 address duplication was detected on.

Idea by: marck


# b9b0dac3 28-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Move towards more explicit support for various network protocol stacks
in the TrustedBSD MAC Framework:

- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
for AARP packet labeling, rather than using a generic link layer
entry point.

- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
for ND6 packet labeling, rather than using a generic link layer entry
point.

- Add expliict entry point mac_netinet_arp_send() for ARP packet
labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
rather than using a generic link layer entry point.

- Remove previous genering link layer entry point,
mac_mbuf_create_linklayer() as it is no longer used.

- Add implementations of new entry points to various policies, largely
by replicating the existing link layer entry point for them; remove
old link layer entry point implementation.

- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
to the MAC Framework rather than static to mac_net.c as it is now
needed outside of mac_net.c.

Obtained from: TrustedBSD Project


# 86407646 26-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Rename 'mac_mbuf_create_from_firewall' to 'mac_netinet_firewall_send' as
we move towards netinet as a pseudo-object for the MAC Framework.

Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 4b421e2d 07-Oct-2007 Mike Silbersack <silby@FreeBSD.org>

Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by: re (kensmith)


# f2565d68 10-May-2007 Robert Watson <rwatson@FreeBSD.org>

Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.


# 1daaa65d 22-Mar-2007 Gleb Smirnoff <glebius@FreeBSD.org>

Remove global list of all llinfo_arp entries and use a callout per
instance expiry of the ARP entries. Since we no longer abuse the IPv4
radix head lock, we can now enter arp_rtrequest() with a lock held on
an arbitrary rt_entry.

Reviewed by: bms


# d0558157 02-Feb-2007 Bruce M Simpson <bms@FreeBSD.org>

Comply with RFC 3927, by forcing ARP replies which contain a source
address within the link-local IPv4 prefix 169.254.0.0/16, to be
broadcast at link layer.

Reviewed by: fenner
MFC after: 2 weeks


# 95ebcabe 14-Jan-2007 Maxim Konovalov <maxim@FreeBSD.org>

o Increment requests counter right before send out an ARP query actually.
Otherwise the code could lead to the spurious EHOSTDOWN errors.

PR: kern/107807
Submitted by: Dmitrij Tejblum
MFC after: 1 month


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# f7a679b2 04-Oct-2006 Gleb Smirnoff <glebius@FreeBSD.org>

Save space on stack moving token ring stuff to its own hack block.


# 9b9a52b4 04-Oct-2006 Gleb Smirnoff <glebius@FreeBSD.org>

Style rev. 1.152.


# 402865f6 23-Sep-2006 John-Mark Gurney <jmg@FreeBSD.org>

now that we don't automagicly increase the MTU of host routes, when we copy
the loopback interface, copy it's mtu also.. This means that we again have
large mtu support for local ip addresses...


# 4b97d7af 29-Jun-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by: glebius, jhb, ru


# 5feebeeb 08-Jun-2006 Andrew Thompson <thompsa@FreeBSD.org>

Enable proxy ARP answers on any of the bridged interfaces if proxy record
belongs to another interface within the bridge group.

PR: kern/94408
Submitted by: Eygene A. Ryabinkin
MFC after: 1 month


# 7f2d8767 07-Mar-2006 Andrew Thompson <thompsa@FreeBSD.org>

Further refine the bridge hack in the arp code. Only do the special arp
handling for interfaces which are actually in the bridge group, ignore all
others.

MFC after: 3 days


# 235073f4 31-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Now that the bridge also processes Ethernet frames as itself, two arp replies
will be sent if there is an address on the bridge. Exclude the bridge from the
special arp handling.

This has been tested with all combinations of addresses on the bridge and members.

Pointed out by: Michal Mertl


# 74948aa6 29-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Back out of r1.148, it causes two arp replies to be sent with different mac
addresses. One for the bridged interface with the IP address assigned but then
another with the mac for the bridge itself.


# 54c427e0 12-Jan-2006 Andrew Thompson <thompsa@FreeBSD.org>

Include the bridge interface itself in the special arp handling.

PR: 90973
MFC after: 1 week


# 39393906 18-Dec-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Add a knob to suppress logging of attempts to modify
permanent ARP entries.

Submitted by: Andrew Alcheyev <buddy telenet.ru>


# bd2b686f 16-Dec-2005 Ed Maste <emaste@FreeBSD.org>

Add descriptions for sysctl -d.

Approved by: glebius
Silence from: rwatson (mentor)


# e1ff74c5 07-Nov-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Rework ARP retransmission algorythm so that ARP requests are
retransmitted without suppression, while there is demand for
such ARP entry. As before, retransmission is rate limited to
one packet per second. Details:
- Remove net.link.ether.inet.host_down_time
- Do not set/clear RTF_REJECT flag on route, to
avoid rt_check() returning error. We will generate error
ourselves.
- Return EWOULDBLOCK on first arp_maxtries failed
requests , and return EHOSTDOWN/EHOSTUNREACH
on further requests.
- Retransmit ARP request always, independently from return
code. Ratelimit to 1 pps.


# f69453ca 04-Oct-2005 Andrew Thompson <thompsa@FreeBSD.org>

When bridging is enabled and an ARP request is recieved on a member interface,
the arp code will search all local interfaces for a match. This triggers a
kernel log if the bridge has been assigned an address.

arp: ac:de:48:18:83:3d is using my IP address 192.168.0.142!

bridge0: flags=8041<UP,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.142 netmask 0xffffff00
ether ac:de:48:18:83:3d

Silence this warning for 6.0 to stop unnecessary bug reports, the code will need
to be reworked.

Approved by: mlaier (mentor)
MFC after: 3 days


# b6de9e91 27-Sep-2005 Max Laier <mlaier@FreeBSD.org>

Remove bridge(4) from the tree. if_bridge(4) is a full functional
replacement and has additional features which make it superior.

Discussed on: -arch
Reviewed by: thompsa
X-MFC-after: never (RELENG_6 as transition period)


# fe53256d 19-Sep-2005 Andre Oppermann <andre@FreeBSD.org>

Use monotonic 'time_uptime' instead of 'time_second' as timebase
for rt->rt_rmx.rmx_expire.


# a20e2538 09-Sep-2005 Gleb Smirnoff <glebius@FreeBSD.org>

- Do not hold route entry lock, when calling arprequest(). One such
call was introduced by me in 1.139, the other one was present before.
- Do all manipulations with rtentry and la before dropping the lock.
- Copy interface address from route into local variable before dropping
the lock. Supply this copy as argument to arprequest()

LORs fixed:
http://sources.zabbadoz.net/freebsd/lor/003.html
http://sources.zabbadoz.net/freebsd/lor/037.html
http://sources.zabbadoz.net/freebsd/lor/061.html
http://sources.zabbadoz.net/freebsd/lor/062.html
http://sources.zabbadoz.net/freebsd/lor/064.html
http://sources.zabbadoz.net/freebsd/lor/068.html
http://sources.zabbadoz.net/freebsd/lor/071.html
http://sources.zabbadoz.net/freebsd/lor/074.html
http://sources.zabbadoz.net/freebsd/lor/077.html
http://sources.zabbadoz.net/freebsd/lor/093.html
http://sources.zabbadoz.net/freebsd/lor/135.html
http://sources.zabbadoz.net/freebsd/lor/140.html
http://sources.zabbadoz.net/freebsd/lor/142.html
http://sources.zabbadoz.net/freebsd/lor/145.html
http://sources.zabbadoz.net/freebsd/lor/152.html
http://sources.zabbadoz.net/freebsd/lor/158.html


# 510b360f 25-Aug-2005 Gleb Smirnoff <glebius@FreeBSD.org>

When we have a published ARP entry for some IP address, do reply on
ARP requests only on the network where this IP address belong, to.

Before this change we did replied on all interfaces. This could
lead to an IP address conflict with host we are doing ARP proxy
for.

PR: kern/75634
Reviewed by: andre


# 1ed7bf1e 11-Aug-2005 Gleb Smirnoff <glebius@FreeBSD.org>

o Fix a race between three threads: output path,
incoming ARP packet and route request adding/removing
ARP entries. The root of the problem is that
struct llinfo_arp was accessed without any locks.
To close race we will use locking provided by
rtentry, that references this llinfo_arp:
- Make arplookup() return a locked rtentry.
- In arpresolve() hold the lock provided by
rt_check()/arplookup() until the end of function,
covering all accesses to the rtentry itself and
llinfo_arp it refers to.
- In in_arpinput() do not drop lock provided by
arplookup() during first part of the function.
- Simplify logic in the first part of in_arpinput(),
removing one level of indentation.
- In the second part of in_arpinput() hold rtentry
lock while copying address.

o Fix a condition when route entry is destroyed, while
another thread is contested on its lock:
- When storing a pointer to rtentry in llinfo_arp list,
always add a reference to this rtentry, to prevent
rtentry being destroyed via RTM_DELETE request.
- Remove this reference when removing entry from
llinfo_arp list.

o Further cleanup of arptimer():
- Inline arptfree() into arptimer().
- Use official queue(3) way to pass LIST.
- Hold rtentry lock while reading its structure.
- Do not check that sdl_family is AF_LINK, but
assert this.

Reviewed by: sam
Stress test: http://www.holm.cc/stress/log/cons141.html
Stress test: http://people.freebsd.org/~pho/stress/log/cons144.html


# 9bd8ca30 09-Aug-2005 Gleb Smirnoff <glebius@FreeBSD.org>

In preparation for fixing races in ARP (and probably in other
L2/L3 mappings) make rt_check() return a locked rtentry.


# 8f867517 04-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by: mlaier (mentor)
Obtained from: NetBSD


# 422a115a 13-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Embrace with #ifdef DEV_CARP carp-related code.


# 2ef4a436 09-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Make ARP do not complain about wrong interface if correct interface
is a carp one and address matched it.

Reviewed by: brooks


# a9771948 22-Feb-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


# c398230b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 067a8bab 08-Dec-2004 Max Laier <mlaier@FreeBSD.org>

More fixing of multiple addresses in the same prefix. This time do not try
to arp resolve "secondary" local addresses.

Found and submitted by: ru
With additions from: OpenBSD (rev. 1.47)
Reviewed by: ru


# d6fa5d28 25-Oct-2004 Bruce M Simpson <bms@FreeBSD.org>

Check that rt_mask(rt) is non-NULL before dereferencing it, in the
RTM_ADD case, thus avoiding a panic.

Submitted by: Iasen Kostov


# 00fcf9d1 12-Oct-2004 Robert Watson <rwatson@FreeBSD.org>

Modify the thrilling "%D is using my IP address %s!" message so that
it isn't printed if the IP address in question is '0.0.0.0', which is
used by nodes performing DHCP lookup, and so constitute a false
positive as a report of misconfiguration.


# 32439868 08-Sep-2004 Gleb Smirnoff <glebius@FreeBSD.org>

Check flag do_bridge always, even if kernel was compiled without
BRIDGE support. This makes dynamic bridge.ko working.

Reviewed by: sam
Approved by: julian (mentor)
MFC after: 1 week


# b8b33234 13-Jun-2004 Doug Rabson <dfr@FreeBSD.org>

Add a new driver to support IP over firewire. This driver is intended to
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.


# b2a8ac7c 25-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

Another small set of changes to reduce diffs with the new arp code.


# 491522ea 25-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

remove a stale comment on the behaviour of arpresolve


# cfff63f1 24-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

Start the arp timer at init time.
It runs so rarely that it makes no sense to wait until the first request.


# cd46a114 25-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

This commit does two things:

1. rt_check() cleanup:
rt_check() is only necessary for some address families to gain access
to the corresponding arp entry, so call it only in/near the *resolve()
routines where it is actually used -- at the moment this is
arpresolve(), nd6_storelladdr() (the call is embedded here),
and atmresolve() (the call is just before atmresolve to reduce
the number of changes).
This change will make it a lot easier to decouple the arp table
from the routing table.

There is an extra call to rt_check() in if_iso88025subr.c to
determine the routing info length. I have left it alone for
the time being.

The interface of arpresolve() and nd6_storelladdr() now changes slightly:
+ the 'rtentry' parameter (really a hint from the upper level layer)
is now passed unchanged from *_output(), so it becomes the route
to the final destination and not to the gateway.
+ the routines will return 0 if resolution is possible, non-zero
otherwise.
+ arpresolve() returns EWOULDBLOCK in case the mbuf is being held
waiting for an arp reply -- in this case the error code is masked
in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
and use the IFP2AC macro to access arpcom fields.
This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
rt_check() cleanup

net/if_ethersubr.c
rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
rt_check() cleanup

netatalk/aarp.c
arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
arpcom untangling

netinet/if_ether.c
rt_check() cleanup (change arpresolve)

netinet6/nd6.c
rt_check() cleanup (change nd6_storelladdr)


# ac912b2d 18-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

Replace Bcopy with 'the real thing' as in the rest of the file.


# f36cfd49 07-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# f7c5baa1 03-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

+ arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.

+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future


# 2964fb65 21-Mar-2004 Matthew N. Dodd <mdodd@FreeBSD.org>

- Fix indentation lost by 'diff -b'.
- Un-wrap short line.


# 64bf80ce 20-Mar-2004 Matthew N. Dodd <mdodd@FreeBSD.org>

Remove interface type specific code from arprequest(), and in_arpinput().

The AF_ARP case in the (*if_output)() routine will handle the interface type
specific bits.

Obtained from: NetBSD


# e952fa39 13-Mar-2004 Matthew N. Dodd <mdodd@FreeBSD.org>

De-register.


# 31175791 23-Dec-2003 Ruslan Ermilov <ru@FreeBSD.org>

I didn't notice it right away, but check the right length too.


# 78e2d2bd 23-Dec-2003 Ruslan Ermilov <ru@FreeBSD.org>

Fix a problem introduced in revision 1.84: m_pullup() does not
necessarily return the same mbuf chain so we need to recompute
mtod() consumers after pulling up.


# 7138d65c 08-Nov-2003 Sam Leffler <sam@FreeBSD.org>

replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by: FreeBSD Foundation


# 7902224c 08-Nov-2003 Sam Leffler <sam@FreeBSD.org>

o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by: FreeBSD Foundation


# 9bf40ede 31-Oct-2003 Brooks Davis <brooks@FreeBSD.org>

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


# 9c63e9db 30-Oct-2003 Sam Leffler <sam@FreeBSD.org>

Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by: FreeBSD Foundation


# d1dd20be 03-Oct-2003 Sam Leffler <sam@FreeBSD.org>

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


# c3b52d64 03-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Shorten 'bad gateway' AF_LINK message.

Submitted by: green


# beb2ced8 03-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Make arp_rtrequest()'s 'bad gateway' messages slightly more informative,
to aid me in tracking down LLINFO inconsistencies in the routing table.

Discussed with: fenner


# b75bead1 03-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Only delete the route if arplookup() tried to create it. Do not delete
RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed
when an RTF_GENMASK route exists in the table.

Add a more verbose comment about exactly what this code does.

Submitted by: ru


# deb62e28 01-Oct-2003 Ruslan Ermilov <ru@FreeBSD.org>

By popular demand, added the "static ARP" per-interface option.


# 85cc1994 24-Sep-2003 Bruce M Simpson <bms@FreeBSD.org>

Fix a logic error in the check to see if arplookup() should free the route.

Noticed by: Mike Hogsett
Reviewed by: ru


# fedf1d01 23-Sep-2003 Bruce M Simpson <bms@FreeBSD.org>

Fix a bug in arplookup(), whereby a hostile party on a locally
attached network could exhaust kernel memory, and cause a system
panic, by sending a flood of spoofed ARP requests.

Approved by: jake (mentor)
Reported by: Apple Product Security <product-security@apple.com>


# 1cafed39 04-Mar-2003 Jonathan Lemon <jlemon@FreeBSD.org>

Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point. Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 022695f8 08-Feb-2003 Orion Hodson <orion@FreeBSD.org>

Avoid multiply for preemptive arp calculation since it hits every
ethernet packet sent.

Prompted by: Jeffrey Hsu <hsu@FreeBSD.org>


# 73224fb0 03-Feb-2003 Orion Hodson <orion@FreeBSD.org>

MFS 1.64.2.22: Re-enable non pre-emptive ARP requests.

Submitted by: "Diomidis Spinellis" <dds@aueb.gr>
PR: kern/46116


# 93f79889 28-Jan-2003 Jeffrey Hsu <hsu@FreeBSD.org>

Avoid lock order reversal by expanding the scope of the
AF_INET radix tree lock to cover the ARP data structures.


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# c996428c 17-Jan-2003 Jeffrey Hsu <hsu@FreeBSD.org>

SMP locking for ARP.


# a9a7a912 09-Jan-2003 Thomas Moestl <tmm@FreeBSD.org>

Clear the target hardware address field when generating an ARP request.

Reviewed by: nectar
MFC after: 1 week


# 19527d3e 31-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an ARP query, invoke a MAC entry point to permit the
MAC framework to label its mbuf appropriately for the interface.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 532cf61b 19-Jun-2002 Peter Wemm <peter@FreeBSD.org>

Solve the 'unregistered netisr 18' information notice with a sledgehammer.
Register the ISR early, but do not actually kick off the timer until we
see some activity. This still saves us from running the arp timers on
a system with no network cards.


# c3a2190c 14-May-2002 Kelly Yancey <kbyanc@FreeBSD.org>

Reset token-ring source routing control field on receipt of ethernet frame
without source routing information. This restores the behaviour in this
scenario to that of prior to my last commit.


# 42fdfc12 07-May-2002 Kelly Yancey <kbyanc@FreeBSD.org>

Move ISO88025 source routing information into sockaddr_dl's sdl_data
field. This returns the sdl_data field to a variable-length field. More
importantly, this prevents a easily-reproduceable data-corruption bug when
the interface name plus the hardware address exceed the sdl_data field's
original 12 byte limit. However, token-ring interfaces may still overflow
the new sdl_data field's 46 byte limit if the interface name exceeds 6
characters (since 6 characters for interface name plus 6 for hardware
address plus 34 for source routing = the size of sdl_data). Further
refinements could overcome this limitation but would break binary
compatibility; this commit only addresses fixing the bug for
commonly-occuring cases without breaking binary compatibility with the
intention that the functionality can be MFC'ed to -stable.

See message ID's (both send to -arch):
20020421013332.F87395-100000@gateway.posi.net
20020430181359.G11009-300000@gateway.posi.net
for a more thorough description of the bug addressed and how to
reproduce it.

Approved by: silence on -arch and -net
Sponsored by: NTT Multimedia Communications Labs
MFC after: 1 week


# 6008862b 04-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# f0f3379e 20-Mar-2002 Orion Hodson <orion@FreeBSD.org>

Send periodic ARP requests when ARP entries for hosts we are sending
to are about to expire. This prevents high packet rate flows from
experiencing packet drops at the sender following ARP cache entry
timeout.

PR: kern/25517
Reviewed by: luigi
MFC after: 7 days


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# c448c89c 12-Dec-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Minor style fix.


# 47891de1 05-Dec-2001 Ruslan Ermilov <ru@FreeBSD.org>

Fixed remotely exploitable DoS in arpresolve().

Easily exploitable by flood pinging the target
host over an interface with the IFF_NOARP flag
set (all you need to know is the target host's
MAC address).

MFC after: 0 days


# ec691a10 25-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

If we are bridging, fall back to using any inet address in the system,
irrespective of receive interface, as a last resort.

Submitted by: ru


# d8b84d9e 19-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Only examine inet addresses of the interface. This was broken in r1.83,
with the result that the system would reply to an ARP request of 0.0.0.0


# 8071913d 17-Oct-2001 Ruslan Ermilov <ru@FreeBSD.org>

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


# 322dcb8d 14-Oct-2001 Max Khon <fjoe@FreeBSD.org>

bring in ARP support for variable length link level addresses

Reviewed by: jdp
Approved by: jdp
Obtained from: NetBSD
MFC after: 6 weeks


# ca925d9c 28-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add a hash table that contains the list of internet addresses, and use
this in place of the in_ifaddr list when appropriate. This improves
performance on hosts which have a large number of IP aliases.


# 75ce3221 04-Sep-2001 Alfred Perlstein <alfred@FreeBSD.org>

Fix sysctl comment field, s/the the/then the

Pointed out by: ru


# e3d123d6 03-Sep-2001 Alfred Perlstein <alfred@FreeBSD.org>

Allow disabling of "arp moved" messages.

Submitted by: Stephen Hurd <deuce@lordlegacy.org>


# 08aadfbb 15-Jun-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Do not perform arp send/resolve on an interface marked NOARP.

PR: 25006
MFC after: 2 weeks


# 4cbc8ad1 26-Mar-2001 Yaroslav Tykhiy <ytykhiy@gmail.com>

Add a missing m_pullup() before a mtod() in in_arpinput().

PR: kern/22177
Reviewed by: wollman


# 7e1cd0d2 09-Feb-2001 Luigi Rizzo <luigi@FreeBSD.org>

Sync with the bridge/dummynet/ipfw code already tested in stable.

In ip_fw.[ch] change a couple of variable and field names to
avoid having types, variables and fields with the same name.


# 41d2ba5e 05-Feb-2001 Julian Elischer <julian@FreeBSD.org>

Fix bad patch from a few days ago. It broke some bridging.


# fc2ffbe6 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


# c8f8e9c1 03-Feb-2001 Julian Elischer <julian@FreeBSD.org>

Make the code act the same in the case of BRIDGE being defined, but not
turned on, and the case of it not being defined at all.
i.e. Disabling bridging re-enables some of the checks it disables.

Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>


# 3269187d 05-Jan-2001 Alfred Perlstein <alfred@FreeBSD.org>

provide a sysctl 'net.link.ether.inet.log_arp_wrong_iface' to allow one
to supress logging when ARP replies arrive on the wrong interface:
"/kernel: arp: 1.2.3.4 is on dc0 but got reply from 00:00:c5:79:d0:0c on dc1"

the default is to log just to give notice about possibly incorrectly
configured networks.


# df5e1987 25-Nov-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.


# cc728227 13-Jul-2000 David Malone <dwmalone@FreeBSD.org>

Extra sanity check when arp proxyall is enabled. Don't send an arp
reply if the requesting machine isn't on the interface we believe
it should be. Prevents arp wars when you plug cables in the wrong
way around.

PR: 9848
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Not objected to by: wollman


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 732f6a43 11-Apr-2000 Wes Peters <wes@FreeBSD.org>

PR: kern/17872
Submitted by: csg@waterspout.com (C. Stephen Gunn)


# f72d9d83 29-Mar-2000 Joerg Wunsch <joerg@FreeBSD.org>

Peter Johnson found another log() call without a trailing newline.
All three of them have been introduced in rev 1.64, so i guess i've
got all of them now. :)

Submitted by: Peter Johnson <locke@mcs.net>


# e44d6283 28-Mar-2000 Joerg Wunsch <joerg@FreeBSD.org>

Added two missing newlines in calls to log(9).

Reported in Usenet by: locke@mcs.net (Peter Johnson)

While i was at it, prepended a 0x to the %D output, to make it clear that
the printed value is in hex (i assume %D has been chosen over %#x to
obey network byte order).


# 84365e2b 23-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

Fix parens in m_pullup() line in arp handling code. The code was
improperly doing the equivalent of (m = (function() == NULL)) instead
of ((m = function()) == NULL).

This fixes a NULL pointer dereference panic with runt arp packets.


# b149dd6c 19-Mar-2000 Larry Lile <lile@FreeBSD.org>

o Replace most magic numbers related to token ring with #defines
from iso88025.h.

o Add minimal llc support to iso88025_input.

o Clean up most of the source routing code.

* Submitted by: Nikolai Saoukh <nms@otdel-1.org>


# 76ec7b2f 10-Mar-2000 Robert Watson <rwatson@FreeBSD.org>

The function arpintr() incorrectly checks m->m_len to detect incomplete
ARP packets. This can incorrectly reject complete frames since the frame
could be stored in more than one mbuf.

The following patches fix the length comparisson, and add several
diagnostic log messages to the interrupt handler for out-of-the-norm ARP
packets. This should make ARP problems easier to detect, diagnose and
fix.

Submitted by: C. Stephen Gunn <csg@waterspout.com>
Approved by: jkh
Reviewed by: rwatson


# 242c5536 12-Feb-2000 Peter Wemm <peter@FreeBSD.org>

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


# 0b757633 18-Oct-1999 Sheldon Hearn <sheldonh@FreeBSD.org>

Append missing newline to log() message for permanent ARP modification
attempt warning, which was added in rev 1.48 .

PR: 14371
Submitted by: sec@pi.musin.de (Stefan `Sec` Zehl)


# f9083fdb 15-Sep-1999 Larry Lile <lile@FreeBSD.org>

Re-arrange the arp code so that fddi arps work properly.


# fcf11853 28-Aug-1999 Larry Lile <lile@FreeBSD.org>

It is much easier to arp if you don't truncate your arp-reply's.
[affects token-ring only]


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# dfd5dee1 06-May-1999 Peter Wemm <peter@FreeBSD.org>

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


# 243c4c6d 15-Apr-1999 Eivind Eklund <eivind@FreeBSD.org>

Better handling for ARP/source routing on Token Ring

Submitted by: Larry Lile <lile@stdio.com>


# fda82fc2 10-Mar-1999 Julian Elischer <julian@FreeBSD.org>

Submitted by: Larry Lile
Move the Olicom token ring driver to the officially sanctionned location of
/sys/contrib. Also fix some brokenness in the generic token ring support.

Be warned that if_dl.h has been changed and SOME programs might
like recompilation.


# 51e1b505 03-Mar-1999 Bill Paul <wpaul@FreeBSD.org>

arprequest() allocates an mbuf with m_gethdr() but does not initialize
m->m_pkthdr.rcvif to NULL. Bad arprequest(). No biscuit.


# 722012cc 20-Feb-1999 Julian Elischer <julian@FreeBSD.org>

World, I'd like you to meet the first FreeBSD token Ring driver.
This is for various Olicom cards. An IBM driver is following.
This patch also adds support to tcpdump to decode packets on tokenring.
Congratulations to the proud father.. (below)

Submitted by: Larry Lile <lile@stdio.com>


# ce02431f 16-Feb-1999 Doug Rabson <dfr@FreeBSD.org>

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


# cc133fa3 19-Jan-1999 Bill Fenner <fenner@FreeBSD.org>

Fix bug in last commit (la was used uninitialized if no route was passed in).


# e7bc5f27 17-Jan-1999 Bill Fenner <fenner@FreeBSD.org>

If arpresolve() gets passed a route with a null llinfo, call
arplookup() to try again. This gets rid of at least one user's
"arpresolve: can't allocate llinfo" errors, and arplookup() gives
better error messages to help track down the problem if there really
is a problem with the routing table.


# a1a2de51 10-Jan-1999 Luigi Rizzo <luigi@FreeBSD.org>

Remove check from where arp replies are coming from -- when doing bridging,
interfaces are used in clusters so the check does not apply.


# b715f178 14-Dec-1998 Luigi Rizzo <luigi@FreeBSD.org>

Last bits (i think) of dummynet for -current.


# dd9b6dde 16-Sep-1998 Bill Fenner <fenner@FreeBSD.org>

Prevent modification of permanent ARP entries (PR kern/7649)
Ignore ARP replies from the wrong interface (discussion on mailing list)
Add interface name to a couple of error messages


# ed7509ac 11-Jun-1998 Julian Elischer <julian@FreeBSD.org>

Go through the loopback code with a broom..
Remove lots'o'hacks.
looutput is now static.

Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.

Consists largly of hack removal :-)


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 245086a0 23-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Get more details on the "arpresolve: can't allocate llinfo" bogon.

PR: 2570
Reviewed by: phk
Submitted by: fenner


# 227ee8a1 30-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


# 1d5e9e22 08-Jan-1998 Eivind Eklund <eivind@FreeBSD.org>

Make INET a proper option.

This will not make any of object files that LINT create change; there
might be differences with INET disabled, but hardly anything compiled
before without INET anyway. Now the 'obvious' things will give a
proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The
only thing that _should_ work (but can't be made to compile reasonably
easily) is sppp :-(

This commit move struct arpcom from <netinet/if_ether.h> to
<net/if_arp.h>.


# c5a1016b 19-Dec-1997 Bruce Evans <bde@FreeBSD.org>

Fixed gratuitous ANSIisms.


# 55b211e3 28-Oct-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# dd570d4d 14-May-1997 Tor Egge <tegge@FreeBSD.org>

Don't send arp request for the ip address 0.0.0.0.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# cbf5d949 15-Dec-1996 Bruce Evans <bde@FreeBSD.org>

Attempt to complete the fix in the previous revision. This version
fixes the problem reported by max.


# 97db6f8d 14-Dec-1996 John Dyson <dyson@FreeBSD.org>

Missing TAILQ mod.


# 254de4ea 15-Nov-1996 Bill Fenner <fenner@FreeBSD.org>

Reword two messages:

duplicate ip address 204.162.228.7! sent from ethernet address: 08:00:20:09:7b:1d
changed to
arp: 08:00:20:09:7b:1d is using my IP address 204.162.228.7!

and

arp info overwritten for 204.162.228.2 by 08:00:20:09:7b:1d
changed to
arp: 204.162.228.2 moved from 08:00:20:07:b6:a0 to 08:00:20:09:7b:1d

I think the new wordings are more clear and could save some support
questions.


# 4458ac71 12-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Removed nested include if <sys/socket.h> from <net/if.h> and
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.


# 51a109a1 21-Jun-1996 Peter Wemm <peter@FreeBSD.org>

Set the rmx.rmx_expire to 0 when creating fake ethernet addresses for the
broadcast and multicast routes, otherwise they will be expired by
arptimeout after a few minutes, reverting to " (incomplete)". This makes
the work done by rev 1.27 stay around until the route itself is deleted.
This is mainly cosmetic for 'arp' and 'netstat -r'.


# 94334d8f 20-Jun-1996 Bill Fenner <fenner@FreeBSD.org>

Use the route that's guaranteed to exist when picking a source address
for ARP requests.

The NetBSD version of this patch (see NetBSD PR kern/2381) has this change
already. This should close our PR kern/1140 .

Although it's not quite what he submitted, I got the idea from him so
Submitted by: Jin Guojun <jin@george.lbl.gov>


# 34bed8b0 12-Jun-1996 David Greenman <dg@FreeBSD.org>

Keep ether_type in network order for BPF to be consistent with other
systems.

Submitted by: Ted Lemon, Matt Thomas, and others. Retrofitted for
-current by me.


# 0453d3cb 08-Jun-1996 Bruce Evans <bde@FreeBSD.org>

Changed some memcpy()'s back to bcopy()'s.

gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.


# 203f0c07 22-Mar-1996 Bill Fenner <fenner@FreeBSD.org>

Send ARP's for aliased subnets with the proper source address.
Get rid of ac->ac_ipaddr and arpwhohas() since they assume that
an interface has only one address.

Obtained from: BSD/OS 2.1, via Rich Stevens <rstevens@noao.edu>


# 7d1ba413 20-Feb-1996 Bill Fenner <fenner@FreeBSD.org>

Make the "arpresolve: can't allocate llinfo" error message
more useful by printing out the IP address it was trying to
resolve, since we're seeing so many complaints about this
error.


# cbb0b46a 05-Feb-1996 Garrett Wollman <wollman@FreeBSD.org>

Fill in the corresponding ether address of multicast and broadcast
pseudo-``ARP entries'' so arp(8) doesn't show them as `unresolved'.


# 1ce9bf88 24-Jan-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Use new printf features rather than local kludges.


# 602d513c 20-Dec-1995 Garrett Wollman <wollman@FreeBSD.org>

in_proto.c: spell ``Internet'' right and put whitespace after commas.

others: start to populate the link-layer branch of the net mib, by
moving ARP to its proper place. (ARP is not a protocol family, it's an
interface layer between a medium-access layer and a protocol family.)
sysctl(8) needs to be taught about the structure of this branch, unless
Poul-Henning implements dynamic MIB exploration soon.


# 081d3b93 15-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Added a prototype.


# f708ef1b 14-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Another mega commit to staticize things.


# 885f1aa4 09-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Remove old ballast, clean up a little bit, staticize.
Add five sysctl variables that you should probably never tweak.
net.arp.t_prune: 300
net.arp.t_keep: 1200
net.arp.t_down: 20
net.arp.maxtries: 5
net.arp.useloopback: 1
net.arp.proxyall: 0

(It's net.arp because arp isn't limited to inet, though our present
implementation surely is).


# ce7609a4 02-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Completed function declarations and/or added prototypes.


# e3965e5b 22-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the last trace of arptnew()


# 66800f57 05-Oct-1995 Garrett Wollman <wollman@FreeBSD.org>

Convert ARP to use queue.h macros rather than insque/remque. While
we're at it, eliminate obsolete exposure of `struct llinfo_arp' to
the world. (This dates back to when ARP entries were not stored in
the routing table, and there was no other way for the `arp' program
to read the whole table than to grovel around in /dev/kmem.)


# efe4b0eb 21-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Second try: get 4.4-Lite-2 into the source tree. The conflicts don't
matter because none of our working source files are on the CSRG branch
any more.

Obtained from: 4.4BSD-Lite-2


# 20dc68b2 27-Jun-1995 Garrett Wollman <wollman@FreeBSD.org>

Delete obsolete #if 0 block.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# 748e0b0a 10-May-1995 Garrett Wollman <wollman@FreeBSD.org>

Make networking domains drop-ins, through the magic of GNU ld. (Some day,
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.


# 94a5d9b6 09-May-1995 David Greenman <dg@FreeBSD.org>

Replaced some bcopy()'s with memcpy()'s so that gcc while inline/optimize.


# f5fea3dd 26-Apr-1995 Paul Traina <pst@FreeBSD.org>

Cleanup loopback interface support.
Reviewed by: wollman


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# ef0cdf33 16-Mar-1995 Garrett Wollman <wollman@FreeBSD.org>

Add inet_ntoa() and replace ARP's private routine with same.


# 38aa9fc3 20-Feb-1995 David Greenman <dg@FreeBSD.org>

Added missing newlines to calls to log().


# b2774d00 22-Dec-1994 Garrett Wollman <wollman@FreeBSD.org>

Make arp_rtrequest() static since nobody needs to referene it any more.


# dd2e4102 22-Dec-1994 Garrett Wollman <wollman@FreeBSD.org>

Move ARP interface initialization into if_ether.c:arp_ifinit().


# 31246bc2 13-Dec-1994 Garrett Wollman <wollman@FreeBSD.org>

Update calls to rtalloc1(). Also merge rt_prflags with rt_flags.


# ac234f93 01-Nov-1994 Garrett Wollman <wollman@FreeBSD.org>

Clean up ARP error messages: format IP addresses, explain arplookup()
failures in English.


# 5df72964 11-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix a bug which caused panics when attempting to change just the flags of
a route. (This still doesn't work, but it doesn't panic now.) It looks
like there may be a number of incipient bugs in this code.

Also, get ready for the time when all IP gateway routes are cloning, which
is necessary to keep proper TCP statistics.


# 623ae52e 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:


# 28e82295 01-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

Implement full proxy ARP, gated on option ARP_PROXYALL. This allows
a FreeBSD box to do proxy ARP as easily as most commercial routers do,
without messing around with (potentially variable) Ethernet addresses.
This code is really quite simple; I'm not at all sure why it wasn't
implemented in 4.4.

It might be worth stealing an interface flag (maybe IFF_LINK1) to use for
finer-grained control over which interfaces get proxy treatment. For the
moment, it's all or nothing.


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources