History log of /freebsd-current/sys/net/if.c
Revision Date Author Comments
# 43387b4e 06-May-2024 Kristof Provost <kp@FreeBSD.org>

if: guard against if_ioctl being NULL

There are situations where an struct ifnet has a NULL if_ioctl pointer.

For example, e6000sw creates such struct ifnets for each of its ports so it can
call into the MII code.

If there is then a link state event this calls do_link_state_change()
-> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() ->
get_operstate_ether(). That wants to know if the link is up or down, so it tries
to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL.

Guard against this, and return EOPNOTSUPP.

PR: 275920
MFC ater: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 9a071e4e 08-Sep-2023 Dag-Erling Smørgrav <des@FreeBSD.org>

Assert that ifnet_detach_sxlock is held where needed.

Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D41770


# 2a371643 21-Jul-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Retire if_etherbpfmtap() and if_bpfmtap()

Summary:
These came in the original DrvAPI commits in 2014, and are obsoleted by
bpf_mtap_if() and ether_bpf_mtap_if(). The `_if` suffix, rather than
prefix, conveys that it's operating on the bpf of the interface, instead
than the interface itself.

Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41146


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# 79379355 16-Jun-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

netlink: convert to IfAPI.

Convert to IfAPI everything except `IF_AFDATA_WLOCK` usage in neigh.c.

Reviewed By: jhibbits
Differential Revision: https://reviews.freebsd.org/D40577


# c344eff9 16-Jun-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

netlink: dump interface capabilities with other interface data.

This change exports interface capabilities using the standard
Netlink attribute type, bitset, and switches `ifconfig(8)` to use
it when displaying interface data.
Bitset comes in two representations. The first one is "compact",
where the bits are exported via two arrays - "mask" listing the
"valid" bits and "values, providing the values for those bits.
The second one is more verbose, listing each bit as a separate item,
with its name, id and value. The latter option is handy when submitting
update requests.

The support for setting capabilities will be added in the upcoming diffs.

Differential Revision: https://reviews.freebsd.org/D40331


# a77facd2 01-Jun-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

ifnet: consistently call hooks when the interface gets up.

Some context on the current IPv6 interface setup & address management:

There are two data path for IPv6 initialisation in context of assigning
LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.

2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.

In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.

This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.

Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D40332
MFC after: 2 weeks


# f766d1d5 10-Apr-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add if_maddr_empty() to check for any maddrs

if_llmaddr_count() only counts link-level multicast addresses.
hv_netvsc(4) needs to know if there are any multicast addresses. Since
hv_netvsc(4) is the only instance where this would be used, make it a
simple boolean. If others need a if_maddr_count(), that can be added in
the future.

Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39493


# 56d4550c 19-Apr-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

ifnet: factor out interface renaming into a separate function.

This change is required to support interface renaming via Netlink.
No functional changes intended.

Reviewed by: zlei
Differential Revision: https://reviews.freebsd.org/D39692
MFC after: 2 weeks


# 7170774e 30-Mar-2023 Konstantin Belousov <kib@FreeBSD.org>

ifcapnv: cap_bit in ifcap2_nv_bit_names[] is bit, not index

Sponsored by: Nvidia networking


# badcb3fd 29-Mar-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

routing: fix panic when adding an interface route to the p2p interface
without and inet/inet6 addresses attached.

MFC after: 3 days


# e2427c69 16-Mar-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add iterator to complement if_foreach()

Summary:
Sometimes an if_foreach() callback can be trivial, or need a lot of
outer context. In this case a regular `for` loop makes more sense. To
keep things hidden in the new API, use an opaque `if_iter` structure
that can still be instantiated on the stack. The current implementation
uses just a single pointer out of the 4 alotted to the opaque context,
and the cleanup does nothing, but may be used in the future.

Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39138


# df2b419a 04-Mar-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

ifnet: add if_foreach_sleep() to allow ifnet iterations with sleep.

Subscribers: imp, ae, glebius

Differential Revision: https://reviews.freebsd.org/D38904


# 66bdbcd5 03-Mar-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

net: unify mtu update code

Subscribers: imp, ae, glebius

Differential Revision: https://reviews.freebsd.org/D38893


# aac2d19d 10-Feb-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Style cleanup

Summary:
Clean up style issues from IfAPI additions.

Casts to (struct ifnet *) made sense when `if_t` was a `void *`, but
since it's a `struct ifnet *` it no longer makes sense. Fix whitespace
errors, among others.

Reviewed by: kib, glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38499


# a3a76c3d 10-Feb-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add capabilities2/capenable2 accessors

Summary:
As a stopgap measure add basic accessors for the if_capabilities2 and
if_capenable2 members to further hide the ifnet details.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, kib
Differential Revision: https://reviews.freebsd.org/D38487


# 189c3729 10-Feb-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: More accessors

Summary:
Add the following accessors needed by infiniband drivers:
* if_getaddrlen()
* if_setbroadcastaddr()
* if_resolvemulti()

With these accessors, and additional changes on the drivers' side, an
amd64 kernel can be compiled with `struct ifnet` completely hidden.

Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38488


# 1e6131ba 01-Feb-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add needed APIs for mbuf support

Summary:
Add 2 new APIs for supporting recent mbuf changes:
* 36e0a362ac added the m_snd_tag_alloc() wrapper around
if_snd_tag_alloc(). Push this down to the ifnet level.
* 4d7a1361ef adds the m_rcvif_serialize()/m_rcvif_restore() KPIs to
serialize and restore an ifnet pointer. Add the necessary wrapper to
get the index generation for this.

Reviewed By: jhb
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38340


# 2eeb8083 01-Feb-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add iterator to loop over all interfaces

Summary:
Sometimes it's useful to iterate over all interfaces in the current
VNET, as the linuxulator does in several places.

Unlike other iterators in the IfAPI this propagates any error received
up to the caller, instead of returning a count.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38348


# d79539e6 25-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add if_altq_is_enabled() interface.

Summary:
The only user of the ALTQ_IS_ENABLED() in a driver checks against the
ifnet queue. Abstract that all out and present the interface to check
if ALTQ is enabled on the interface.

Sponsored by: Juniper Networks, Inc.
Reviewed By: glebius
Differential Revision: https://reviews.freebsd.org/D38204


# 31cfaf19 25-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add l2com accessor for firewire.

Summary:
Firewire is the only device driver that accesses the l2com member, all
other accesses are handled within the netstack itself.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38203


# 0d2684e1 24-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add some more accessors

Summary:
* if_setreassignfn for wireguard.
* if_getinputfn() and if_getstartfn() for various drivers. Use the
function descriptor typedefs for these and the setters.
* vlantrunk accessor. This is used by VLAN_CAPABILITIES() used by
several drivers, as well as directly by mxge(4).
* if_pcp member accessor, used by cxgbe.
* accessors for netmap adapter.

Sponsored by: Juniper Networks, Inc.
Reviewed By: glebius
Differential Revision: https://reviews.freebsd.org/D38202


# c255d1a4 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add if_llsoftc member accessors for TOEDEV

Summary:
Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV()
macro to complement with the new accessors.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38199


# 30af2c13 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add if_get/setmaclabel() and use it.

Summary:
Port the MAC modules to use the IfAPI APIs as part of this.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38197


# 113af4fd 24-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add if_gettype() API and use it for vlan

Sponsored by: Juniper Networks, Inc.
Reviewed by: #network, glebius
Differential Revision: https://reviews.freebsd.org/D38198


# 053a24d1 13-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

debugnet: Add ifnet accessor to set debugnet methods

As part of the effort to hide the internals of the ifnet struct, convert
the DEBUGNET_SET() macro to use an accessor instead of directly touching
the methods member.

Reviewed by: glebius (older version)
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38105


# 2c2b37ad 13-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet/API: Move struct ifnet definition to a <net/if_private.h>

Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.

Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046


# fa25dbfd 12-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet API: Change if_init() to take context argument

Some drivers, like iflib drivers, take a 'context' argument instead of a
ifnet argument, as a single interface may have multiple contexts.
Follow this scheme by passing the context argument down. Most drivers
will likely pass 'ifp' as the context.

Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38102


# d3467839 09-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet/API: Change if_set*bit accessors to clear first

Summary:
A common pattern has been to:

if (foo)
caps = IFCAP_FOO;
ifp->if_capenable &= ~IFCAP_FOO;
ifp->if_capenable |= caps;

which in the new order of things would be:

if (foo)
caps = IF_FOO;
if_setcapenablebits(ifp, 0, IFCAP_FOO);
if_setcapenablebits(ifp, caps, 0);

This change streamlines this into:

if (foo)
caps = IF_FOO;
if_setcapenablebits(ifp, caps, IFCAP_FOO);

Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37993


# 74abe47e 21-Dec-2022 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet/DrvAPI: Implement if_setoutputfn() accessor

Fixes: eb1da3e5258238e1c0555c6a006a341df0821d8e
Sponsored by: Juniper Networks, Inc.


# eb1da3e5 09-Dec-2022 Justin Hibbits <jhibbits@FreeBSD.org>

DrvAPI: Extend driver KPI with more accessors

Summary:
Add the following accessors to hide some more netstack details:
* if_get/setcapabilities2 and *bits analogue
* if_setdname
* if_getxname
* if_transmit - wrapper for call to ifp->if_transmit()
- This required changing the existing if_transmit to
if_transmit_default, since that's its purpose.
* if_getalloctype
* if_getindex
* if_foreach_addr_type - Like if_foreach_lladdr() but for any address
family type. Used by some drivers to iterate over all AF_INET
addresses.
* if_init() - wrapper for ifp->if_init() call
* if_setinputfn
* if_setsndtagallocfn
* if_togglehwassist

Reviewers: #transport, #network, glebius, melifaro

Reviewed by: #network, melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37664


# bfeef0d3 30-Nov-2022 Nick Reilly <nreilly@blackberry.com>

pf: fix pfi_ifnet leak on interface removal

The detach of the interface and group were leaving pfi_ifnet memory
behind. Check if the kif still has references, and clean it up if it
doesn't

On interface detach, the group deletion was notified first and then a
change notification was sent. This would recreate the group in the kif
layer. Reorder the change to before the delete.

PR: 257218
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37569


# 1bcd230f 03-Dec-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

netlink: add interface notification on link status / flags change.

* Add link-state change notifications by subscribing to ifnet_link_event.
In the Linux netlink model, link state is reported in 2 places: first is
the IFLA_OPERSTATE, which stores state per RFC2863.
The second is an IFF_LOWER_UP interface flag. As many applications rely
on the latter, reserve 1 bit from if_flags, named as IFF_NETLINK_1.
This flag is mapped to IFF_LOWER_UP in the netlink headers. This is done
to avoid making applications think this flag is actually
supported / presented in non-netlink outputs.
* Add flag change notifications, by hooking into rt_ifmsg().
In the netlink model, notification should include the bitmask for the
change flags. Update rt_ifmsg() to include such bitmask.

Differential Revision: https://reviews.freebsd.org/D37597


# 984b27d8 30-Nov-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

net: add if_allocdescr() to permit updating iface description from the kernel

Reviewed by: kp,zlei
Differential Revision: https://reviews.freebsd.org/D37566
MFC after: 2 weeks


# 9a7c520a 24-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

ifp: add if_setdescr() / if_freedesrt() methods

Add methods for setting and removing the description from the interface,
so the external users can manage it without using ioctl API.

MFC after: 2 weeks


# e18c5816 29-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

domains: use queue(9) SLIST for linked list of domains


# e7d02be1 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: refactor protosw and domain static declaration and load

o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d18ef), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860dcd).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee54ea).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232


# d8b42ddc 11-Aug-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

rtsock: subscribe to ifnet eventhandlers instead of direct calls.

Stop treating rtsock as a "special" consumer and use already-provided
ifaddr arrival/departure notifications.

MFC after: 2 weeks

Test Plan:
```
21:05 [0] m@devel0 route -n monitor

-> ifconfig vtnet0.2 create

got message of size 24 on Tue Aug 9 21:05:44 2022
RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: arrival

got message of size 168 on Tue Aug 9 21:05:54 2022
RTM_IFINFO: iface status change: len 168, if# 3, link: up, flags:<BROADCAST,RUNNING,SIMPLEX,MULTICAST>

-> ifconfig vtnet0.2 destroy

got message of size 24 on Tue Aug 9 21:05:54 2022
RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: departure

```

Reviewed By: glebius
Differential Revision: https://reviews.freebsd.org/D36095
MFC after: 2 weeks


# b8103ca7 11-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet: get interface event notifications directly via EVENTHANDLER(9)

The old mechanism of getting them via domains/protocols control input
is a relict from the previous century, when nothing like EVENTHANDLER(9)
existed yet. Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one
to use them.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36116


# 150486f6 29-Jul-2022 Zhenlei Huang <zlei.huang@gmail.com>

Introduce and use the NET_EPOCH_DRAIN_CALLBACKS() macro

Reviewed by: melifao, kp
Differential Revision: https://reviews.freebsd.org/D35968


# d6cd20cc 30-May-2022 KUROSAWA Takahiro <takahiro.kurosawa@gmail.com>

netinet6: fix ndp proxying

We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
link-level address table when receiving Neighbor Solicitations.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35307
MFC after: 2 weeks


# 051e7d78 17-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

Kernel-side infrastructure to implement nvlist-based set/get ifcaps

Reviewed by: hselasky, jhb, kp (previous version)
Sponsored by: NVIDIA Networking
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D32551


# 868bf821 27-Mar-2022 Kristof Provost <kp@FreeBSD.org>

if: avoid interface destroy race

When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.

Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).

PR: 262829
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D34704


# 4d7a1361 26-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif

Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33266
Fun note: git show e6abef09187a

(cherry picked from commit e1882428dcbbafd2814d7e17b977a8f686784b39)


# 80e60e23 26-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: make if_index global

Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.

Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33672

(cherry picked from commit 91f44749c6feb50f39af8805dd803e860f0418f1)


# d461deea 03-May-2022 Marko Zec <zec@FreeBSD.org>

VNET: Revert "ifnet: make if_index global"

This reverts commit 91f44749c6feb50f39af8805dd803e860f0418f1.

Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:

1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.

2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.

3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.

4) The change effectively led to nonuniform ifnet index allocation
among vnets.

5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one. The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.

6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.

Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act. Therefore, I decided to resolve the
status-quo by backing this out myself. This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex


# 6c741ffb 03-May-2022 Marko Zec <zec@FreeBSD.org>

Revert "mbuf: do not restore dying interfaces"

This reverts commit 703e533da5e2e4743d38bbf4605fec041bc69976.

Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif"

This reverts commit e1882428dcbbafd2814d7e17b977a8f686784b39.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex


# 1a15a383 09-Apr-2022 Gordon Bergling <gbe@FreeBSD.org>

net: Fix a typo in a source code comment

- s/peform/perform/

MFC after: 3 days


# 964b8f8b 28-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: garbage collect unused function ifaddr_byindex().

Last use was removed in 5adea417d49.


# e1882428 26-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif

Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33266
Fun note: git show e6abef09187a


# 91f44749 26-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: make if_index global

Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.

Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33672


# 54712fc4 24-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

if_vmove: improve restoration in cloner's ifgroup membership

* Do a single call into if_clone.c instead of two. The cloner
can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
exists in the new vnet.

Differential revision: https://reviews.freebsd.org/D33941


# 5adea417 11-Feb-2021 Ryan Stone <rstone@FreeBSD.org>

Fix ifa refcount leak in ifa_ifwithnet()

In 4f6c66cc9c75c8, ifa_ifwithnet() was changed to no longer
ifa_ref() the returned ifaddr, and instead the caller was required
to stay in the net_epoch for as long as they wanted the ifaddr
to remain valid. However, this missed the case where an AF_LINK
lookup would call ifaddr_byindex(), which still does ifa_ref()
the ifaddr. This would cause a refcount leak.

Fix this by inlining the relevant parts of ifaddr_byindex() here,
with the ifa_ref() call removed. This also avoids an unnecessary
entry and exit from the net_epoch for this case.

I've audited all in-tree consumers of ifa_ifwithnet() that could
possibly perform an AF_LINK lookup and confirmed that none of them
will expect the ifaddr to have a reference that they need to
release.

MFC after: 2 months
Sponsored by: Dell Inc
Differential Revision: https://reviews.freebsd.org/D28705
Reviewed by: melifaro


# e735fa32 09-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

net/if.c: plug set-but-not-unused vars

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 7e0bba4d 04-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: make V_if_index static to if.c

This requires moving net.link.generic sysctl declaration from if_mib.c
to if.c. Ideally if_mib.c needs just to be merged to if.c, but they
have different license texts.

Differential revision: https://reviews.freebsd.org/D33263


# d74b7bae 04-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet_byindex() actually requires network epoch

Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch. Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit. Mark that with a comment.

Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33263


# 7b40b00f 04-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: merge ifindex_alloc(), ifnet_setbyindex(), if_grow() and call magic

Now it is possible to just merge all this complexity into single
linear function. Note that IFNET_WLOCK() is a sleepable lock, so
we can M_WAITOK and epoch_wait_preempt().

Reviewed by: melifaro, bz, kp
Differential revision: https://reviews.freebsd.org/D33262


# 6ff4cac2e 04-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: initial if_grow() shall always succeed

So let's just call malloc() directly. This also avoids hidden
doubling of default V_if_indexlim.

Reviewed by: melifaro, bz, kp
Differential revision: https://reviews.freebsd.org/D33261


# 450394af 04-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: use ck_pr(3) store & load setting ifnet pointer in ifindex

The lockless access to the array is protected by the network epoch.

Reviewed by: bz, kp
Differential revision: https://reviews.freebsd.org/D33260


# 8062e575 04-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: allocate index at the end of if_alloc_domain()

Now that if_alloc_domain() never fails and actually doesn't
expose ifnet to outside we can eliminate IFNET_HOLD and two
step index allocation.

Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33259


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.


# 9e93d2b3 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: enable & fix if_debug build

Fixes: ce40632a316c5


# 3bc40f39 22-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

if_free: add a comment explaining why ifindex_free() is performed here


# fe499a84 22-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: merge if_destroy() and if_free_internal() into one

New function has more meaningful name if_free_deferred() and has
its header comment fixed to reflect reality. NFC


# 4787572d 22-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: make if_alloc_domain() never fail

The last consumer of if_com_alloc() is firewire. It never fails
to allocate. Most likely the if_com_alloc() KPI will go away
together with if_fwip(), less likely new consumers of if_com_alloc()
will be added, but they would need to follow the no fail KPI.


# 1e3ca25d 22-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: make if_alloc_domain() static


# ce40632a 22-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: append if_debug.c to if.c

With this change if_index can become static. There is nothing
that if_debug.c would want to isolate from if.c. Potentially
if.c wants to share everything with if_debug.c.

Move Bjoern's copyright to if.c.

Reviewed by: bz


# 8a6f38c8 22-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

ifnet: garbage collect drbr_*_drv().

They were left in 62d76917b8678 but after years proved not to be useful.


# 5a4e46f6 14-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

net: whack "set but not used" warnings in net/if.c

Sponsored by: Rubicon Communications, LLC ("Netgate")


# b1e6a792 10-Sep-2021 Mark Johnston <markj@FreeBSD.org>

net: Enter a net epoch around protocol if_up/down notifications

When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.

Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by: kp, melifaro
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31889


# 01ad0c00 25-Jul-2021 Kristof Provost <kp@FreeBSD.org>

net: disallow MTU changes on bridge member interfaces

if_bridge member interfaces should always have the same MTU as the
bridge itself, so disallow MTU changes on interfaces that are part of an
if_bridge.

Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31304


# 4fb3e0bb 23-Jun-2021 Rozhuk Ivan <rozhuk.im@gmail.com>

devctl: add RENAME devctl event for IFNET

Add devd event on network iface rename.

Reviewed by: imp@,asomers@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30839


# ad22ba2b 12-May-2021 Mark Johnston <markj@FreeBSD.org>

if: Remove unnecessary validation in the SIOCSIFNAME handler

A successful copyinstr() call guarantees that the returned string is
nul-terminated. Furthermore, the removed check would harmlessly compare
an uninitialized byte with '\0' if the new name is shorter than
IFNAMESIZ - 1.

Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# ed93deba 11-May-2021 John Baldwin <jhb@FreeBSD.org>

Remove a write-only variable.

While refactoring an earlier series of changes during review, the
'saved_data' variable stopped being used at the bottom of if_ioctl().

Suggested by: brooks
Reviewed by: brooks, imp, kib
Fixes: d17e0940f79f Rework compat shims in ifioctl().
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D30197


# 9c87db4b 05-May-2021 John Baldwin <jhb@FreeBSD.org>

Group all compat shim structures together to consolidate #ifdef's.

Reviewed by: brooks, kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29894


# 01e9cbc4 05-May-2021 John Baldwin <jhb@FreeBSD.org>

Use thunks for compat ioctls using struct ifgroupreq.

Reviewed by: brooks, kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29893


# d61d98f4 05-May-2021 John Baldwin <jhb@FreeBSD.org>

Add freebsd32 compat shims for SIOC[GS]DRVSPEC.

Reviewed by: brooks, kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29892


# d17e0940 05-May-2021 John Baldwin <jhb@FreeBSD.org>

Rework compat shims in ifioctl().

Centralize logic for handling compat ioctls into two blocks of code at
the start and end of the ioctl routine. This avoids the conversion
logic being spread out both in multiple blocks in ifioctl as well as
various helper functions.

Reviewed by: brooks, kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29891


# 8e8f1cc9 23-Apr-2021 Mark Johnston <markj@FreeBSD.org>

Re-enable network ioctls in capability mode

This reverts a portion of 274579831b61 ("capsicum: Limit socket
operations in capability mode") as at least rtsol and dhcpcd rely on
being able to configure network interfaces while in capability mode.

Reported by: bapt, Greg V
Sponsored by: The FreeBSD Foundation


# 27457983 07-Apr-2021 Mark Johnston <markj@FreeBSD.org>

capsicum: Limit socket operations in capability mode

Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.

Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.

Reviewed by: oshogbo
Discussed with: emaste
Reported by: manu
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29423


# 25bfa448 21-Mar-2021 Adrian Chadd <adrian@FreeBSD.org>

Add device and ifnet logging methods, similar to device_printf / if_printf

* device_printf() is effectively a printf
* if_printf() is effectively a LOG_INFO

This allows subsystems to log device/netif stuff using different log levels,
rather than having to invent their own way to prefix unit/netif names.

Differential Revision: https://reviews.freebsd.org/D29320
Reviewed by: imp


# 092f3f08 06-Mar-2021 Tai-hwa Liang <avatar@FreeBSD.org>

net: fixing a memory leak in if_deregister_com_alloc()

Drain the callbacks upon if_deregister_com_alloc() such that the
if_com_free[type] won't be nullified before if_destroy().

Taking fwip(4) as an example, before this fix, kldunload if_fwip will
go through the following:

1. fwip_detach()
2. if_free() -> schedule if_destroy() through NET_EPOCH_CALL
3. fwip_detach() returns
4. firewire_modevent(MOD_UNLOAD) -> if_deregister_com_alloc()
5. kernel complains about:
Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).
6. EPOCH runs if_destroy() -> if_free_internal()i

By this time, if_com_free[if_alloctype] is NULL since it's already
nullified by if_deregister_com_alloc(); hence, firewire_free() won't
have a chance to release the allocated fw_com.

Reviewed by: hselasky, glebius
MFC after: 2 weeks


# 7563019b 22-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Add if_try_ref() to simplify refcount handling inside epoch.

When we have an ifp pointer and the code is running inside epoch,
epoch guarantees the pointer will not be freed.
However, the following case can still happen:

* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug

This can happen with the current implementation of ifnet_byindex_ref(),
as we're not holding any locks preventing ifnet deletion by a parallel thread.

To address it, add if_try_ref(), allowing to return failure when
referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
cleaner error in such scenarios.

Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
if the latter fails.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28836


# 600eade2 16-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Add ifa_try_ref() to simplify ifa handling inside epoch.

More and more code migrates from lock-based protection to the NET_EPOCH
umbrella. It requires some logic changes, including, notably, refcount
handling.

When we have an `ifa` pointer and we're running inside epoch we're
guaranteed that this pointer will not be freed.
However, the following case can still happen:
* in thread 1 we drop to 0 refcount for ifa and schedule its deletion.
* in thread 2 we use this ifa and reference it
* destroy callout kicks in
* unhappy user reports bug

To address it, new `ifa_try_ref()` function is added, allowing to return
failure when we try to reference `ifa` with 0 refcount.
Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide
cleaner error in such scenarious.

Reviewed By: rstone, donner
Differential Revision: https://reviews.freebsd.org/D28639
MFC after: 1 week


# 6d2a10d9 08-Feb-2021 Kristof Provost <kp@FreeBSD.org>

Widen ifnet_detach_sxlock coverage

Widen the ifnet_detach_sxlock to cover the entire vnet sysuninit code.
This ensures that we can't end up having the vnet_sysuninit free the UDP
pcb while the detach code is running and trying to purge the UDP pcb.

MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28530


# f9e0752e 12-Jan-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Create new in6_purgeifaddr() which purges bound ifa prefix if
it gets unused.

Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6
ifaddrs. in6_purgeaddr() does not trrigger prefix removal if
number of linked ifas goes to 0, as this is a low-level function.
As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but
keeps corresponding IPv6 prefixes.

Fix this by creating higher-level wrapper which handles unused
prefix usecase and use it in if_purgeifaddrs().

Differential revision: https://reviews.freebsd.org/D28128


# 7f883a9b 01-Dec-2020 Kristof Provost <kp@FreeBSD.org>

net: Revert vnet/epair cleanup race mitigation

Revert the mitigation code for the vnet/epair cleanup race (done in r365457).
r368237 introduced a more reliable fix.

MFC after: 2 weeks
Sponsored by: Modirum MDPay


# e133271f 01-Dec-2020 Kristof Provost <kp@FreeBSD.org>

if: Fix panic when destroying vnet and epair simultaneously

When destroying a vnet and an epair (with one end in the vnet) we often
panicked. This was the result of the destruction of the epair, which destroys
both ends simultaneously, happening while vnet_if_return() was moving the
struct ifnet to its home vnet. This can result in a freed ifnet being re-added
to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
used.

Prevent this race by ensuring that vnet_if_return() cannot run at the same time
as if_detach() or epair_clone_destroy().

PR: 238870, 234985, 244703, 250870
MFC after: 2 weeks
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D27378


# cd853791 27-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225


# bca0e1d2 25-Nov-2020 Kristof Provost <kp@FreeBSD.org>

if: Fix non-VIMAGE build

if_link_ifnet() and if_unlink_ifnet() are needed even when VIMAGE is not
enabled.

MFC after: 2 weeks
Sponsored by: Modirum MDPay


# a779388f 25-Nov-2020 Kristof Provost <kp@FreeBSD.org>

if: Protect V_ifnet in vnet_if_return()

When we terminate a vnet (i.e. jail) we move interfaces back to their home
vnet. We need to protect our access to the V_ifnet CK_LIST.

We could enter NET_EPOCH, but if_detach_internal() (called from if_vmove())
waits for net epoch callback completion. That's not possible from NET_EPOCH.
Instead, we take the IFNET_WLOCK, build a list of the interfaces that need to
move and, once we've released the lock, move them back to their home vnet.

We cannot hold the IFNET_WLOCK() during if_vmove(), because that results in a
LOR between ifnet_sx, in_multi_sx and iflib ctx lock.

Separate out moving the ifp into or out of V_ifnet, so we can hold the lock as
we do the list manipulation, but do not hold it as we if_vmove().

Reviewed by: melifaro
MFC after: 2 weeks
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D27279


# a60100fd 25-Nov-2020 Kristof Provost <kp@FreeBSD.org>

if: Remove ifnet_rwlock

It no longer serves any purpose, as evidenced by the fact that we never take it
without ifnet_sxlock.

Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D27278


# bad6b236 08-Nov-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Move all ifaddr route creation business logic to net/route/route_ifaddr.c

Differential Revision: https://reviews.freebsd.org/D26318


# c1aedfcb 28-Sep-2020 Ed Maste <emaste@FreeBSD.org>

add SIOCGIFDATA ioctl

For interfaces that do not support SIOCGIFMEDIA (for which there are
quite a few) the only fallback is to query the interface for
if_data->ifi_link_state. While it's possible to get at if_data for an
interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms.

SIOCGIFDATA is a simple ioctl to retrieve this fast with very little
resource use in comparison. This implementation mirrors that of other
similar ioctls in FreeBSD.

Submitted by: Roy Marples <roy@marples.name>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D26538


# a969635b 08-Sep-2020 Kristof Provost <kp@FreeBSD.org>

net: mitigate vnet / epair cleanup races

There's a race where dying vnets move their interfaces back to their original
vnet, and if_epair cleanup (where deleting one interface also deletes the other
end of the epair). This is commonly triggered by the pf tests, but also by
cleanup of vnet jails.

As we've not yet been able to fix the root cause of the issue work around the
panic by not dereferencing a NULL softc in epair_qflush() and by not
re-attaching DYING interfaces.

This isn't a full fix, but makes a very common panic far less likely.

PR: 244703, 238870
Reviewed by: lutz_donnerhacke.de
MFC after: 4 days
Differential Revision: https://reviews.freebsd.org/D26324


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 96ad26ee 04-Aug-2020 Mark Johnston <markj@FreeBSD.org>

Remove free_domain() and uma_zfree_domain().

These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches. They no longer serve any
purpose since UMA now provides their functionality by default. Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by: cem, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25937


# e1c05fd2 21-Jul-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Transition from rtrequest1_fib() to rib_action().

Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546


# 72587123 19-Jul-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Temporarly revert r363319 to unbreak the build.

Reported by: CI
Pointy hat to: melifaro


# 8cee15d9 19-Jul-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Transition from rtrequest1_fib() to rib_action().

Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546


# 2bbab0af 23-May-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use epoch(9) for rtentries to simplify control plane operations.

Currently the only reason of refcounting rtentries is the need to report
the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
is used by some of the customers to get the matching prefix along
with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision: https://reviews.freebsd.org/D24866


# 8ad798ae 03-Mar-2020 Brooks Davis <brooks@FreeBSD.org>

Expose ifr_buffer_get_(buffer|length) outside if.c.

This is a preparatory commit for D23933.

Reviewed by: jhb


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# e87c4940 24-Feb-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by: hselasky, jeff, bz, gallatin
Differential Revision: https://reviews.freebsd.org/D23674


# 10108cb6 17-Feb-2020 Bjoern A. Zeeb <bz@FreeBSD.org>

Partially revert VNET change and expand VNET structure.

Revert parts of r353274 replacing vnet_state with a shutdown flag.

Not having the state flag for the current SI_SUB_* makes it harder to debug
kernel or module panics related to VNET bringup or teardown.
Not having the state also does not allow us to check for other dependency
levels between components, e.g. for moving interfaces.

Expand the VNET structure with the new boolean flag indicating that we are
doing a shutdown of a given vnet and update the vnet magic cookie for the
change.

Update libkvm to compile with a bool in the kernel struct.

Bump __FreeBSD_version for (external) module builds to more easily detect
the change.

Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23097


# cd0be8b2 06-Feb-2020 Jeff Roberson <jeff@FreeBSD.org>

Temporarily force IFF_NEEDSEPOCH until drivers have been resolved.

Recent network epoch changes have left some drivers unexpectedly broken
and there is not yet a consensus on the correct fix. This is patch is
a minor performance impact until we can agree on the correct path
forward.

Reviewed by: core, network, imp, glebius, hselasky
Differential Revision: https://reviews.freebsd.org/D23515


# 2888eb40 17-Jan-2020 Eugene Grosbein <eugen@FreeBSD.org>

ifa_maintain_loopback_route: adjust debugging output

Correction after r333476:

- write this as LOG_DEBUG again instead of LOG_INFO;
- get back function name into the message;
- error may be ESRCH if an address is removed in process (by carp f.e.),
not only ENOENT;
- expression complexity grows, so try making it more readable.

MFC after: 1 week


# 2a4bd982 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch. The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC


# 97168be8 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute assertion of in_epoch(net_epoch_preempt) to
NET_EPOCH_ASSERT(). NFC


# 3264dcad 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

- Move global network epoch definition to epoch.h, as more different
subsystems tend to need to know about it, and including if_var.h is
huge header pollution for them. Polluting possible non-network
users with single symbol seems much lesser evil.
- Remove non-preemptible network epoch. Not used yet, and unlikely
to get used in close future.


# c7bab2a7 08-Jan-2020 Kyle Evans <kevans@FreeBSD.org>

if_vmove: return proper error status

if_vmove can fail if it lost a race and the vnet's already been moved. The
callers (and their callers) can generally cope with this, but right now
success is assumed. Plumb out the ENOENT from if_detach_internal if it
happens so that the error's properly reported to userland.

Reviewed by: bz, kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22780


# 5fcb2832 02-Jan-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Plug loopback idaddr refcount leak.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22980


# 3f197b13 20-Dec-2019 Mark Johnston <markj@FreeBSD.org>

Deduplicate code between if_delgroup() and if_delgroups().

Fix some style in if_addgroup(). No functional change intended.

Reviewed by: hselasky
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22892


# 718ef55e 20-Dec-2019 Mark Johnston <markj@FreeBSD.org>

Fix a memory leak in if_delgroups() introduced in r334118.

PR: 242712
Submitted by: ghuckriede@blackberry.com
MFC after: 3 days


# 3232273f 24-Nov-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

Allow kernel to compile without BPF.

r297816 added some bpf magic for VIMAGE unconditionally which no longer
allows kernels to compile without bpf (but with other networking).
Add the missing ifdef checks and allow a kernel to compile without bpf
again.

PR: 242136
Reported by: dave mischler.com
MFC after: 2 weeks


# 7993a104 22-Nov-2019 Conrad Meyer <cem@FreeBSD.org>

Add explicit SI_SUB_EPOCH

Add explicit SI_SUB_EPOCH, after SI_SUB_TASKQ and before SI_SUB_SMP
(EARLY_AP_STARTUP). Rename existing "SI_SUB_TASKQ + 1" to SI_SUB_EPOCH.

epoch(9) consumers cannot epoch_alloc() before SI_SUB_EPOCH:SI_ORDER_SECOND,
but likely should allocate before SI_SUB_SMP. Prior to this change,
consumers (well, epoch itself, and net/if.c) just open-coded the
SI_SUB_TASKQ + 1 order to match epoch.c, but this was fragile.

Reviewed by: mmacy
Differential Revision: https://reviews.freebsd.org/D22503


# 9352fab6 13-Nov-2019 Gleb Smirnoff <glebius@FreeBSD.org>

In if_siocaddmulti() enter VNET.

Reported & tested by: garga


# 0839aa5c 29-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

There is a long standing problem with multicast programming for NICs
and IPv6. With IPv6 we may call if_addmulti() in context of processing
of an incoming packet. Usually this is interrupt context. While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't. An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden. Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion. However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped. Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.

Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context. Note that nobody cares about returned value.

Reviewed by: hselasky, kib
Differential Revision: https://reviews.freebsd.org/D22154


# 19e09f44 21-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Remove obsoleted KPIs that were used to access interface address lists.


# 7790c8c1 17-Oct-2019 Conrad Meyer <cem@FreeBSD.org>

Split out a more generic debugnet(4) from netdump(4)

Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421


# b46d70fd 16-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

do_link_state_change() is executed in taskqueue context and in
general is allowed to sleep. Don't enter the epoch for the
whole duration. If some event handlers need the epoch, they
should handle that theirselves.

Discussed with: hselasky


# 270b83b9 14-Oct-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

The two functions ifnet_byindex() and ifnet_byindex_locked() are exactly the
same after the network stack was epochified. Merge the two into one function
and cleanup all uses of ifnet_byindex_locked().

While at it:
- Add branch prediction macros.
- Make sure the ifnet pointer is only deferred once,
also when code optimisation is disabled.

Sponsored by: Mellanox Technologies


# 93cfeb0e 15-Oct-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Exclude the network link eventhandler from epochification after r353292.

This fixes the following assert when "options RATELIMIT" is used:
panic()
malloc()
sysctl_add_oid()
tcp_rl_ifnet_link()
do_link_state_change()
taskqueue_run_locked()

Sponsored by: Mellanox Technologies


# 416a1d1e 14-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

if_delmulti() is never called without ifp argument, assert this instead
of doing a useless search through interfaces.


# fb3fc771 10-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Add two extra functions that basically give count of addresses
on interface. Such function could been implemented on top of
the if_foreach_llm?addr(), but several drivers need counting,
so avoid copy-n-paste inside the drivers.


# 826857c8 10-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Provide new KPI for network drivers to access lists of interface
addresses. The KPI doesn't reveal neither how addresses are stored,
how the access to them is synchronized, neither reveal struct ifaddr
and struct ifmaddr.

Reviewed by: gallatin, erj, hselasky, philip, stevek
Differential Revision: https://reviews.freebsd.org/D21943


# 1e80e4f2 08-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Remove epoch assertion from if_setlladdr(). Originally this function was
protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous
if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(),
likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I
incorrectly added NET_EPOCH_ASSERT() here.

In reality ifp->if_addr never goes away and never changes its length. So,
doing bcopy() in it is always "safe", meaning it won't dereference a wrong
pointer or write into someone's else memory. Of course doing two bcopy() in
parallel would result in a mess of two addresses, but net epoch doesn't
protect against that, neither IF_ADDR_RLOCK() did.

So for now, just remove the assertion and leave for later a proper fix.

Reported by: markj


# e9dc46cc 08-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch.
This quickly plugs the regression from r353292. The locking of multicast
definitely needs a broader review today...

Reported by: pho, dhw


# b8a6e03f 07-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111


# 204e2f30 07-Oct-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Factor out VNET shutdown check into an own vnet structure field.
Remove the now obsolete vnet_state field. This greatly simplifies the
detection of VNET shutdown and avoids code duplication.

Discussed with: bz@
MFC after: 1 week
Sponsored by: Mellanox Technologies


# dd902d01 25-Sep-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Add debugging facility EPOCH_TRACE that checks that epochs entered are
properly nested and warns about recursive entrances. Unlike with locks,
there is nothing fundamentally wrong with such use, the intent of tracer
is to help to review complex epoch-protected code paths, and we mean the
network stack here.

Reviewed by: hselasky
Sponsored by: Netflix
Pull Request: https://reviews.freebsd.org/D21610


# 247cf566 17-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

Add SIOCGIFDOWNREASON.

The ioctl(2) is intended to provide more details about the cause of
the down for the link.

Eventually we might define a comprehensive list of codes for the
situations. But interface also allows the driver to provide free-form
null-terminated ASCII string to provide arbitrary non-formalized
information. Sample implementation exists for mlx5(4), where the
string is fetched from firmware controlling the port.

Reviewed by: hselasky, rrs
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21527


# 40b1c921 12-Sep-2019 Kyle Evans <kevans@FreeBSD.org>

SIOCSIFNAME: Do nothing if we're not actually changing

Instead of throwing EEXIST, just succeed if the name isn't actually
changing. We don't need to trigger departure or any of that because there's
no change from consumers' perspective.

PR: 240539
Reviewed by: brooks
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D21618


# 0dbdf041 28-Jun-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Need to wait for epoch callbacks to complete before detaching a
network interface.

This particularly manifests itself when an INP has multicast options
attached during a network interface detach. Then the IPv4 and IPv6
leave group call which results from freeing the multicast address, may
access a freed ifnet structure. These are the steps to reproduce:

service mdnsd onestart # installed from ports

ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy

Tested by: pho @
MFC after: 1 week
Sponsored by: Mellanox Technologies


# fb3bc596 24-May-2019 John Baldwin <jhb@FreeBSD.org>

Restructure mbuf send tags to provide stronger guarantees.

- Perform ifp mismatch checks (to determine if a send tag is allocated
for a different ifp than the one the packet is being output on), in
ip_output() and ip6_output(). This avoids sending packets with send
tags to ifnet drivers that don't support send tags.

Since we are now checking for ifp mismatches before invoking
if_output, we can now try to allocate a new tag before invoking
if_output sending the original packet on the new tag if allocation
succeeds.

To avoid code duplication for the fragment and unfragmented cases,
add ip_output_send() and ip6_output_send() as wrappers around
if_output and nd6_output_ifp, respectively. All of the logic for
setting send tags and dealing with send tag-related errors is done
in these wrapper functions.

For pseudo interfaces that wrap other network interfaces (vlan and
lagg), wrapper send tags are now allocated so that ip*_output see
the wrapper ifp as the ifp in the send tag. The if_transmit
routines rewrite the send tags after performing an ifp mismatch
check. If an ifp mismatch is detected, the transmit routines fail
with EAGAIN.

- To provide clearer life cycle management of send tags, especially
in the presence of vlan and lagg wrapper tags, add a reference count
to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
Provide a helper function (m_snd_tag_init()) for use by drivers
supporting send tags. m_snd_tag_init() takes care of the if_ref
on the ifp meaning that code alloating send tags via if_snd_tag_alloc
no longer has to manage that manually. Similarly, m_snd_tag_rele
drops the refcount on the ifp after invoking if_snd_tag_free when
the last reference to a send tag is dropped.

This also closes use after free races if there are pending packets in
driver tx rings after the socket is closed (e.g. from tcpdrop).

In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
Drivers now also check this flag instead of checking snd_tag against
NULL. This avoids false positive matches when a forwarded packet
has a non-NULL rcvif that was treated as a send tag.

- cxgbe was relying on snd_tag_free being called when the inp was
detached so that it could kick the firmware to flush any pending
work on the flow. This is because the driver doesn't require ACK
messages from the firmware for every request, but instead does a
kind of manual interrupt coalescing by only setting a flag to
request a completion on a subset of requests. If all of the
in-flight requests don't have the flag when the tag is detached from
the inp, the flow might never return the credits. The current
snd_tag_free command issues a flush command to force the credits to
return. However, the credit return is what also frees the mbufs,
and since those mbufs now hold references on the tag, this meant
that snd_tag_free would never be called.

To fix, explicitly drop the mbuf's reference on the snd tag when the
mbuf is queued in the firmware work queue. This means that once the
inp's reference on the tag goes away and all in-flight mbufs have
been queued to the firmware, tag's refcount will drop to zero and
snd_tag_free will kick in and send the flush request. Note that we
need to avoid doing this in the middle of ethofld_tx(), so the
driver grabs a temporary reference on the tag around that loop to
defer the free to the end of the function in case it sends the last
mbuf to the queue after the inp has dropped its reference on the
tag.

- mlx5 preallocates send tags and was using the ifp pointer even when
the send tag wasn't in use. Explicitly use the ifp from other data
structures instead.

- Sprinkle some assertions in various places to assert that received
packets don't have a send tag, and that other places that overwrite
rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.

Reviewed by: gallatin, hselasky, rgrimes, ae
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20117


# e2e050c8 19-May-2019 Conrad Meyer <cem@FreeBSD.org>

Extract eventfilter declarations to sys/_eventfilter.h

This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.


# 2ad7ed6e 19-May-2019 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix rt_ifa selection during loopback route insertion process.
Currently such routes are added with a link-level IFA, which is
plain wrong. Only after the insertion they get fixed by the special
link_rtrequest() ifa handler. This behaviour complicates routing code
and makes ifa selection more complex.
Streamline this process by explicitly moving link_rtrequest() logic
to the pre-insertion rt_getifa_fib() ifa selector. Avoid calling all
this logic in the loopback route case by explicitly specifying
proper rt_ifa inside the ifa_maintain_loopback_route().§

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20076


# 7687707d 22-Apr-2019 Andrew Gallatin <gallatin@FreeBSD.org>

Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory

This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets. When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by: glebius, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19930


# bc6f170e 22-Jan-2019 Brooks Davis <brooks@FreeBSD.org>

Rework CASE_IOC_IFGROUPREQ() to require a case before the macro.

This is more compatible with formatting tools and looks more normal.

Reported by: jhb (on a different review)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18442


# a68cc388 08-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin


# 21ee61a6 08-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Remove part of comment that doesn't match reality.


# 85107355 30-Nov-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Adapt the fix in r341008 to correctly work with EBR.

IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding
it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because
unlinking occurs with IFNET_WLOCK(), that is rw_wlock+sx_xlock, and it
doesn check that concurrent code is running in epoch section. But while
we are in epoch section, we should be able to do access to ifnet's
fields, even it was unlinked. Thus do not change if_addr and if_hw_addr
fields in ifnet_detach_internal() to NULL, since rtsock code can do
access to these fields and this is allowed while it is running in epoch
section.

This should fix the race, when ifnet_detach_internal() unlinks ifnet
after we checked it for IFF_DYING in sysctl_dumpentry.

Move free(ifp->if_hw_addr) into ifnet_free_internal(). Also remove the
NULL check for ifp->if_description, since free(9) can correctly handle
NULL pointer.

MFC after: 1 week


# a716ad4a 27-Nov-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Fix possible panic during ifnet detach in rtsock.

The panic can happen, when some application does dump of routing table
using sysctl interface. To prevent this, set IFF_DYING flag in
if_detach_internal() function, when ifnet under lock is removed from
the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent
ifnet detach during routes enumeration. In case, if some interface was
detached in the time before we take the lock, add the check, that ifnet
is not DYING. This prevents access to memory that could be freed after
ifnet is unlinked.

PR: 227720, 230498, 233306
Reviewed by: bz, eugen
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18338


# b79aa45e 13-Nov-2018 Gleb Smirnoff <glebius@FreeBSD.org>

For compatibility KPI functions like if_addr_rlock() that used to have
mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.

Reviewed by: markj


# 25c6ab1b 02-Nov-2018 Kristof Provost <kp@FreeBSD.org>

Notify that the ifnet will go away, even on vnet shutdown

pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.

Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17500


# 88d57e9e 22-Oct-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Resolve deadlock between epoch(9) and various network interface
SX-locks, during if_purgeaddrs(), by not allowing to hold the epoch
read lock over typical network IOCTL code paths. This is a regression
issue after r334305.

Reviewed by: ae (network)
Differential revision: https://reviews.freebsd.org/D17647
MFC after: 1 week
Sponsored by: Mellanox Technologies


# 8251c68d 21-Oct-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add KPI that can be used by tunneling interfaces to handle IP addresses
appearing and disappearing on the host system.

Such handling is need, because tunneling interfaces must use addresses,
that are configured on the host as ingress addresses for tunnels.
Otherwise the system can send spoofed packets with source address, that
belongs to foreign host.

The KPI uses ifaddr_event_ext event to implement addresses tracking.
Tunneling interfaces register event handlers and then they are
notified by the kernel, when an address disappears or appears.

ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event()
in the ip_encap.c

MFC after: 1 month
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17134


# 64d63b1e 21-Oct-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17100


# 1687b1ab 29-Sep-2018 Michael Tuexen <tuexen@FreeBSD.org>

For changing the MTU on tun/tap devices, it should not matter whether it
is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or
doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command.
Without this patch, for IPv6 the new MTU is not used when creating routes.
Especially, when initiating TCP connections after increasing the MTU,
the old MTU is still used to compute the MSS.
Thanks to ae@ and bz@ for helping to improve the patch.

Reviewed by: ae@, bz@
Approved by: re (kib@)
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D17180


# 77ad07b6 21-Aug-2018 Matt Macy <mmacy@FreeBSD.org>

fix copy/paste error when clearing ifma flag

CID: 1395119
Reported by: vangyzen


# 32d2623a 16-Aug-2018 Navdeep Parhar <np@FreeBSD.org>

Add the ability to look up the 3b PCP of a VLAN interface. Use it in
toe_l2_resolve to fill up the complete vtag and not just the vid.

Reviewed by: kib@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D16752


# f9be0386 15-Aug-2018 Matt Macy <mmacy@FreeBSD.org>

Fix in6_multi double free

This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
- add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
goes to zero OR when the interface goes away
- decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
- add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
door wider still to races
- call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# 98a8fdf6 09-Jul-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Deduplicate the code.

Add generic function if_tunnel_check_nesting() that does check for
allowed nesting level for tunneling interfaces and also does loop
detection. Use it in gif(4), gre(4) and me(4) interfaces.

Differential Revision: https://reviews.freebsd.org/D16162


# 2da19677 07-Jul-2018 Sean Bruno <sbruno@FreeBSD.org>

struct ifmediareq *ifmrp is only used in the COMPAT_FREEBSD32 parts of
ifioctl(). Move it inside the proper #ifdef. This was throwing a valid
"Assigned but unused" warning with gcc.

Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16063


# 6573d758 03-Jul-2018 Matt Macy <mmacy@FreeBSD.org>

epoch(9): allow preemptible epochs to compose

- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
there's no longer any benefit to dropping the pcbinfo lock
and trying to do so just adds an error prone branchfest to
these functions
- Remove cases of same function recursion on the epoch as
recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
thread as the tracker field is now stack or heap allocated
as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066


# 91d6c9b9 30-May-2018 Matt Macy <mmacy@FreeBSD.org>

if_setlladdr: don't call ioctl in epoch context

PR: 228612
Reported by: markj


# 1ebec5fa 28-May-2018 Matt Macy <mmacy@FreeBSD.org>

route: fix missed ref adds
- ensure that we bump the ifa ref whenever we add a reference
- defer freeing epoch protected references until after the if_purgaddrs
loop


# 5328b11c 24-May-2018 Matt Macy <mmacy@FreeBSD.org>

if_delgroups: add missed unlock introduced by r334118


# 4f6c66cc 23-May-2018 Matt Macy <mmacy@FreeBSD.org>

UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15409


# f6cb0dea 19-May-2018 Matt Macy <mmacy@FreeBSD.org>

net: fix uninitialized variable warning


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# f2d19f98 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

epoch(9): allocate net epochs earlier in boot


# d71e30de 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

epoch: move epoch variables to read mostly section


# 70398c2f 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

epoch(9): Make epochs non-preemptible by default

There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.

Reported by: markj


# 5e68a3df 17-May-2018 Matt Macy <mmacy@FreeBSD.org>

epoch: add non-preemptible "critical" variant

adds:
- epoch_enter_critical() - can be called inside a different epoch,
starts a section that will acquire any MTX_DEF mutexes or do
anything that might sleep.
- epoch_exit_critical() - corresponding exit call
- epoch_wait_critical() - wait variant that is guaranteed that any
threads in a section are running.
- epoch_global_critical - an epoch_wait_critical safe epoch instance

Requested by: markj
Approved by: sbruno


# 5c30b378 10-May-2018 Matt Macy <mmacy@FreeBSD.org>

Allow different bridge types to coexist

if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by: sbruno@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15344


# 20f8d7bc 10-May-2018 Dag-Erling Smørgrav <des@FreeBSD.org>

Slight cleanup of interface event logging.

Make if_printf() use vlog() instead of vprintf(). This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.

Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().

In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not. We still return an error code, though.

MFC after: 1 week


# 7bf272a6 10-May-2018 Matt Macy <mmacy@FreeBSD.org>

Allocate epoch for networking at startup

Additionally add CK to include paths for modules

Approved by: sbruno@


# b6f6f880 06-May-2018 Matt Macy <mmacy@FreeBSD.org>

r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by: lwhsu
Approved by: sbruno


# e5054602 05-May-2018 Mark Johnston <markj@FreeBSD.org>

Import the netdump client code.

This is a component of a system which lets the kernel dump core to
a remote host after a panic, rather than to a local storage device.
The server component is available in the ports tree. netdump is
particularly useful on diskless systems.

The netdump(4) man page contains some details describing the protocol.
Support for configuring netdump will be added to dumpon(8) in a future
commit. To use netdump, the kernel must have been compiled with the
NETDUMP option.

The initial revision of netdump was written by Darrell Anderson and
was integrated into Sandvine's OS, from which this version was derived.

Reviewed by: bdrewery, cem (earlier versions), julian, sbruno
MFC after: 1 month
X-MFC note: use a spare field in struct ifnet
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15253


# f3e1324b 02-May-2018 Stephen Hurd <shurd@FreeBSD.org>

Separate list manipulation locking from state change in multicast

Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by: mmacy <mmacy@mattmacy.io>
Reviewed by: shurd, sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14969


# 3edb7f4e 25-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Translate 32-bit ifmedia requests into native ones.

We use transformation rather than accessors as virtually ever driver
implements SIOCGIFMEDIA and all would have to be touched.

Keep the code readable by always performing copies and (possiably no-op)
transforms.

Reviewed by: jhb, kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14996


# 3a4fc8a8 13-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Remove support for the Arcnet protocol.

While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR: 182297
Reviewed by: jhibbits, vangyzen
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15057


# 0437c8e3 11-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Remove support for FDDI networks.

Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017


# 8a4a4a43 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Remove the thread argument from ifr_buffer_*() accessors.

They are always used in a context where curthread is the correct thread.
This makes them more similar to the ifr_data_get_ptr() accessor.


# e7fdc72e 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

ifconf(): correct handling of sockaddrs smaller than struct sockaddr.

Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.

I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice. If it did, the result would be an 8 byte heap leak on
current architectures.

admbugs: 869
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 3 days
Security: kernel heap leak
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14981


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 756181b8 05-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Add 32-bit compat for ioctls that take struct ifgroupreq.

Use an accessor to access ifgr_group and ifgr_groups.

Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.

Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14960


# 2443045f 05-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

ifconf(): Always zero the whole struct ifreq.

The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.

Reviewed by: kib, emaste
Obtained from: CheriBSD
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14912


# 8708f1bd 30-Mar-2018 Brooks Davis <brooks@FreeBSD.org>

Document and enforce assumptions about struct (in6_)ifreq.

- The two types must be type-punnable for shared members of ifr_ifru.
This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru. This is
assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
broadly portable. This is true for all current architectures, but very
large (256-bit) fat-pointers could violate this invariant.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14910


# 541d96aa 30-Mar-2018 Brooks Davis <brooks@FreeBSD.org>

Use an accessor function to access ifr_data.

This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900


# 69f0fecb 28-Mar-2018 Brooks Davis <brooks@FreeBSD.org>

Remove infrastructure for token-ring networks.

Reviewed by: cem, imp, jhb, jmallett
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14875


# f8f65519 27-Mar-2018 Brooks Davis <brooks@FreeBSD.org>

Fix a whitespace bug missed in refactoring prior to r331641.

MFC with: r331641


# 86d2ef16 27-Mar-2018 Brooks Davis <brooks@FreeBSD.org>

Fix access to ifru_buffer on freebsd32.

Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.

Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14846


# f1379734 27-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Allow to specify PCP on packets not belonging to any VLAN.

According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful. See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.

Make it possible to specify PCP value for outgoing packets on an
ethernet interface. When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value. The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on
receive. This bug should be fixed for each driver.

Reviewed by: ae (previous version), hselasky, melifaro
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14702


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 9f23a54e 05-Nov-2017 Eugene Grosbein <eugen@FreeBSD.org>

Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).

Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).

PR: 222647, 223129
Reviewed by: gnn
Approved by: avg (mentor), mav (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12747


# 0f3af041 04-Sep-2017 Sepherosa Ziehau <sephe@FreeBSD.org>

if: Add ioctls to get RSS key and hash type/function.

It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485

These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.

Setting RSS key and hash type/function is a different story,
which probably requires more discussion.

Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.

Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12174


# ddae5750 10-May-2017 Ravi Pokala <rpokala@FreeBSD.org>

Persistently store NIC's hardware MAC address, and add a way to retrive it

The MAC address reported by `ifconfig ${nic} ether' does not always match
the address in the hardware, as reported by the driver during attach. In
particular, NICs which are components of a lagg(4) interface all report the
same MAC.

When attaching, the NIC driver passes the MAC address it read from the
hardware as an argument to ether_ifattach(). Keep a second copy of it, and
create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along
with the active MAC address.

PR: 194386
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D10609


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# efe3b0de 27-Feb-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SVR4 (System V Release 4) binary compatibility support.

UNIX System V Release 4 is operating system released in 1988. It ceased
to exist in early 2000-s.


# d0b2cad1 31-Jan-2017 Stephen J. Kiernan <stevek@FreeBSD.org>

Add the folowing set accessor functions for recently-added members of ifnet
structure:

if_gethwtsomax(), if_sethwtsomax() - if_hw_tsomax
if_gethwtsomaxsegcount(), if_sethwtsomaxsegcount() - if_hw_tsomaxsegcount
if_gethwtsomaxsegsize(), if_sethwtsomaxsegsize() - if_hw_tsomaxsegsize

Update em and vnic drivers which had already been coverted to use accessor
functions for the other ifnet structure members.

Reviewed by: erj
Approved by: sjg (mentor)
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D8544


# 2bbd06fc 28-Jan-2017 Andriy Voskoboinyk <avos@FreeBSD.org>

Garbage collect IFT_IEEE80211 (but leave the define for possible reuse)

This interface type ("a parent interface of wlanX") is not used since
r287197

Reviewed by: adrian, glebius
Differential Revision: https://reviews.freebsd.org/D9308


# 6597559e 28-Jan-2017 Dexuan Cui <dexuan@FreeBSD.org>

ifnet: move the new ifnet_event EVENTHANDLER_DECLARE to net/if_var.h

Thank glebius for pointing this out:
"The network stuff shall not be added to sys/eventhandler.h"

Reviewed by: David_A_Bright_DELL.com, sephe, glebius
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D9345


# 338e227a 25-Jan-2017 Luiz Otavio O Souza <loos@FreeBSD.org>

After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by: glebius
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)


# 92a6859b 24-Jan-2017 Dexuan Cui <dexuan@FreeBSD.org>

ifnet: introduce event handlers for ifup/ifdown events

Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together, mainly to support seamless live migration.

When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).

So the synthetic NIC driver needs to know when a VF device is becoming
UP or DOWN and hence the patch is made.

Reviewed by: sephe
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8963


# cc5bb78b 05-Jan-2017 Sepherosa Ziehau <sephe@FreeBSD.org>

if: Defer the if_up until the ifnet.if_ioctl is called.

This ensures the interface is initialized by the interface driver
before it can be used by the rest of the system.

Reviewed by: jhb, karels, gnn
MFC after: 3 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8905


# 368bf0c2 11-Oct-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

ifnet: Use if_link_state snapshot to invoke ifnet_link_event

So that everyone in this task have consistent view of link state.

Reviewed by: ae
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8214


# 32e0ade6 24-Jul-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Partially revert r257696/r257713, which have an issue with writing to user
controlled address. Restore the old code that emulated OSIOCGIFCONF in if.c.

Noticed by: C Turt


# a29c7aeb 28-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Several device drivers call if_alloc() and then do further checks and
will cal if_free() in case of conflict, error, ..
if_free() however sets the VNET instance from the ifp->if_vnet which
was not yet initialized but would only in if_attach(). Fix this by
setting the curvnet from where we allocate the interface in if_alloc().
if_attach() will later overwrite this as needed. We do not set the home_vnet
early on as we only want to prevent the if_free() panic but not change any
of the other housekeeping, e.g., triggered through ifioctl()s.

Reviewed by: brooks
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7010


# d3f6f80f 22-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

After r302054 unloading an network interface driver on a kernel
without VIMAGE support would dereference a NULL point unconditionally
leading to a panic. Wrap the entire VIMAGE related code with #ifdefs
rather than just the decision making part to save an extra bit of
resources.

Reported by: np
Sponsored by: The FreeBSD Foundation
MFC After: 13 days
Approved by: re (marius)


# 89856f7e 21-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747


# 2d5ad99a 06-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

After tearing down the interface per-"domain" bits, set the data area
to NULL to avoid it being mis-treated on a possible re-attach but also
to get a clean NULL pointer derefence in case of errors due to
unexpected race conditions elsewhere in the code, e.g., callouts.

Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# d117fd80 06-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Similarly to r301505 protect the removal of the ifa from the if_addrhead
by a lock (as well as the check that the list is not empty).

Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# f22d78c0 06-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

In if_purgeaddrs() we cannot hold the lock over the entire loop
due to called functions (as in other parts of the stack, leave a comment).
Put around a lock the removal of the ifa from the list however to
reduce the possible race with other places.

Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# c169d9fe 28-May-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

In if_attachdomain1() there does not seem to be any reason
to use TRYLOCK rather than just acquire the lock, so just do that.

Reviewed by: markj
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6578


# dbd2ee46 25-May-2016 Nick Hibma <n_hibma@FreeBSD.org>

Change net.link.log_promisc_mode_change to a read-only tunable

PR: 166255
Submitted by: eugen.grosbein.net
Obtained from: hselasky
MFC after: 3 days


# ad4e9116 18-May-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than having the if_vmove() code intermixed in the vnet_destroy()
function in vnet.c move it to if.c where it logically belongs and put
it under a VNET_SYSUNINIT() call.
To not change the current behaviour make sure it runs first thing
during teardown. In the future this will allow us more flexibility
on changing the order on when we want to get rid of interfaces.

Stop exporting if_vmove() and make it file static.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6438


# 4c7070db 17-May-2016 Scott Long <scottl@FreeBSD.org>

Import the 'iflib' API library for network drivers. From the author:

"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional. The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation. ixl and ixgbe driver conversions will be committed
shortly. We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by: mmacy@nextbsd.org
Reviewed by: gallatin
Differential Revision: D5211


# 1ef3d54d 15-May-2016 Don Lewis <truckman@FreeBSD.org>

When handling SIOCSIFNAME ensure that the new interface name is NUL
terminated. Reject the rename attempt if the name is too long.

MFC after: 1 week


# 6d07c157 12-May-2016 Nick Hibma <n_hibma@FreeBSD.org>

Allow silencing of 'promiscuous mode enabled/disabled' messages.

PR: 166255
Submitted by: eugen.grosbein.net
Obtained from: eugen.grosbein.net
MFC after: 1 week


# a4641f4e 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net*: minor spelling fixes.

No functional change.


# 155d72c4 15-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net* : for pointers replace 0 with NULL.

Mostly cosmetical, no functional change.

Found with devel/coccinelle.


# 05fc4164 11-Apr-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

During if_vmove() we call if_detach_internal() which in turn calls the event
handler notifying about interface departure and one of the consumers will
detach if_bpf.
There is no way for us to re-attach this easily as the DLT and hdrlen are
only given on interface creation.
Add a function to allow us to query the DLT and hdrlen from a current
BPF attachment and after if_attach_internal() manually re-add the if_bpf
attachment using these values.

Found by panics triggered by nd6 packets running past BPF_MTAP() with no
proper if_bpf pointer on the interface.

Also add a basic DDB show function to investigate the if_bpf attachment
of an interface.

Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5896


# 1f12da0e 22-Jan-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Just checkpoint the WIP in order to be able to make the tree update
easier. Note: this is currently not in a usable state as certain
teardown parts are not called and the DOMAIN rework is missing.
More to come soon and find its way to head.

Obtained from: P4 //depot/user/bz/vimage/...
Sponsored by: The FreeBSD Foundation


# 4fb3a820 30-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102


# f501e6f1 22-Dec-2015 Bjoern A. Zeeb <bz@FreeBSD.org>

If vnets are torn down while ifconfig runs an ioctl to say, destroy an
epair(4), we may hit if_detach_internal() without holding a lock and by
the time we aquire it the interface might be gone.
We should not panic() in this case as it is our fault for not holding
the lock all the way. It is not ideal to return silently without error
to user space, but other callers will all ignore the return values so
do not change the entire KPI for little benefit for now.
The ifp will be dealt with one way or another still.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4529


# d6e82913 17-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay


# 52e53e2d 15-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111


# ef91a976 25-Nov-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Overhaul if_enc(4) and make it loadable in run-time.

Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).


# 8ad43f2d 14-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Move iflladdr_event eventhandler invocation to if_setlladdr.

Suggested by: glebius


# b13c5b5d 09-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use lladdr_event to propagate gratiotus arp.

Differential Revision: https://reviews.freebsd.org/D4019


# bb3d23fd 01-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after: 4 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D4004


# 59c180c3 16-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().


# 441f9243 04-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3464


# 4bdf0b6a 08-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

MFP r274295:

* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.


# 22a93840 20-Jul-2015 Marko Zec <zec@FreeBSD.org>

Prevent null-pointer dereferencing.

MFC after: 3 days


# 966ab68d 23-Jun-2015 Dimitry Andric <dim@FreeBSD.org>

Fix endless recursion in sys/net/if.c's drbr_inuse_drv(), found by clang
3.7.0.

Reviewed by: marcel


# eb7e25b2 07-Apr-2015 Eric Joyner <erj@FreeBSD.org>

ifmedia changes:

- Extend the number of available subtypes for Ethernet media by using some
of the ifmedia word's option bits to help denote subtypes. As a result, the
number of possible Ethernet subtype values increases from 31 to 511.

- Use some of those new values to define new media types.

- lacp_compose_key() recgonizes the new Ethernet media types added.
(Change made as required by a comment in if_media.h)

- New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types.
SIOCGIFMEDIA is retained for backwards compatibility.

- Changes to ifconfig to allow it to handle the new extended media types.

Submitted by: mike@karels.net (original), hselasky
Reviewed by: jfvogel, gnn, hselasky
Approved by: jfvogel (mentor), gnn (mentor)
Differential Revision: http://reviews.freebsd.org/D1965


# b57d9721 12-Mar-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Add if_input_default() method, that will be used for if_input
initialization, when no input method specified before if_attach().

This prevents panics when if_input() method called directly e.g.
from bpf(4) code.

PR: 192426
Reviewed by: glebius
MFC after: 1 week


# c92a456b 02-Mar-2015 Hiroki Sato <hrs@FreeBSD.org>

Fix group membership of cloned interfaces when one is moved by
if_vmove().

In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface. When
detaching, if_delgroup() was called and the interface leaves all of
the group membership. And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.

This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation. However, if_vmove() removed the membership and did
not restore upon attachment.

Differential Revision: https://reviews.freebsd.org/D1859


# 5d14e4cd 29-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Provide rte_<get|set> methods to access rtentry for external consumers.


# 1be1588a 29-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Make ifa_add_loopback_route() prepare gw before insertion.
* Temporarily move ifa_switch_loopback_route() implementation to route.c


# 73d77028 23-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do more fine-grained lltable locking: use table runtime lock as rare
as we can.


# 7c066c18 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
(IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.


# 27688dfe 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Temporarily revert r274774.


# 9883e41b 20-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Switch IF_AFDATA lock to rmlock


# 7f948f12 16-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Finish r274175: do control plane MTU tracking.

Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR: 194238
MFC after: 1 month
CR: D1125


# 3c7c188c 10-Nov-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by: Mellanox Technologies
Discussed with: lstewart @
MFC after: 1 week


# 4ea05db8 09-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Use standard mtx(9), rwlock(9), sx(9) system initialization macros
instead of doing initialization manually.

Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 1398ffe5 08-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.


# f4507b71 08-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

ifindex_alloc_locked() never fails and doesn't have no-lock version,
so change the prototype.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 833e8dc5 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove struct arpcom. It is unused by most interface types, that allocate
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# e6abef09 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove useless structure ifindex_entry.

Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 1a75e3b2 06-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make checks for rt_mtu generic:

Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
In this case domain overrides radix _addroute callback (in[6]_addroute)
and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
In this case, there are no explicit per-domain calls, but one can
override rte by setting ifa_rtrequest hook to domain handler
(inet6 does this).

3) ifconfig ifaceX mtu YYYY
In this case, we have no callbacks, but ip[6]_output performes runtime
checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
control plane, not in runtime part, and properly deal with increased
interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
for some cases, so we need an abstract way to know maximum MTU size
for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
use this functions on new non-inserted rte.

More changes will follow soon.

MFC after: 1 month
Sponsored by: Yandex LLC


# 8c3cfe0b 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.


# 61dc4344 23-Oct-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Move if_get_counter initialization from if_attach into if_alloc.
Also, initialize all counters before ifnet will become available in the system.
This fixes possible access to uninitialized ifned fields.

PR: 194550


# 112f50ff 28-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Finally, convert counters in struct ifnet to counter(9).

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 9fd573c3 22-Sep-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by: adrian, rmacklem
Sponsored by: Mellanox Technologies
MFC after: 1 week


# 56b61ca2 19-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# d2a707cd 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove a bunch of methods that are superseded by if_inc_counter().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 1b7fb1d9 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 35853c2c 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Add a function to set if_get_counter method for an ifnet. To be used
in the drivers that are already converted to "Juniper drvapi". This
can be revisited in future.


# 277e067a 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with: melifaro


# 0b7b006c 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Add if_inc_counter(), a generic method to update ifnet(9) counter
w/o dereferencing the struct.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 72f31000 13-Sep-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Revert r271504. A new patch to solve this issue will be made.

Suggested by: adrian @


# eb93b77a 13-Sep-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 4f8585e0 11-Sep-2014 Alan Somers <asomers@FreeBSD.org>

Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
Fixup calls of modified functions.

share/man/man9/ifnet.9
Document changed API.

CR: https://reviews.freebsd.org/D458
MFC after: Never
Sponsored by: Spectra Logic


# 09a8241f 30-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# e6485f73 31-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

o Remove struct if_data from struct ifnet. Now it is merely API structure
for route(4) socket and ifmib(4) sysctl.
o Move fields from if_data to ifnet, but keep all statistic counters
separate, since they should disappear later.
o Provide function if_data_copy() to fill if_data, utilize it in routing
socket and ifmib handler.
o Provide overridable ifnet(9) method to fetch counters. If no provided,
if_get_counters_compat() would be used, that returns old counters.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# af371fc6 16-Aug-2014 Roger Pau Monné <royger@FreeBSD.org>

net: move interface removal notification up in if_detach_internal

This is needed to prevent having interfaces with ifp->if_addr == NULL
on bridge interfaces. Moving the notification event handlers up makes
sure the interfaces are removed before doing any more cleanup.

Sponsored by: Citrix Systems R&D
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D598

net/if.c
- Move interface removal notification up in if_detach_internal.


# 9753faf5 29-Jul-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric


# c29a3321 16-Jul-2014 Kevin Lo <kevlo@FreeBSD.org>

Deprecate m_act. Use m_nextpkt always.


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 62d76917 02-Jun-2014 Marcel Moolenaar <marcel@FreeBSD.org>

Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers. This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1. Juniper Networks - The network stack implementation in Junos
is entirely different from FreeBSD's one and this change
allows Juniper to build "stock" NIC drivers that can be used
in combination with both the FreeBSD and Junos stacks.
2. FreeBSD - This change opens the door towards changing ifnet
and implementing new features and optimizations in the network
stack without it requiring a change in the many NIC drivers
FreeBSD has.

Submitted by: Anuranjan Shukla <anshukla@juniper.net>
Reviewed by: glebius@
Obtained from: Juniper Networks, Inc.


# 2f308a34 29-May-2014 Alan Somers <asomers@FreeBSD.org>

Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
Add legacy-compatible functions as described above. Ensure legacy
behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
Call with _fib() functions if we must use a specific fib, or the
legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
Improve the udp_dontroute test. The bug that this test exercises is
that ifa_ifwithnet() will return the wrong address, if multiple
interfaces have addresses on the same subnet but with different
fibs. The previous version of the test only considered one possible
failure mode: that ifa_ifwithnet_fib() might fail to find any
suitable address at all. The new version also checks whether
ifa_ifwithnet_fib() finds the correct address by checking where the
ARP request goes.

Reported by: bz, hrs
Reviewed by: hrs
MFC after: 1 week
X-MFC-with: 264905
Sponsored by: Spectra Logic


# 36d55f0f 26-Apr-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Unify sa_equal() macro usage.

MFC after: 2 weeks


# 0cfee0c2 24-Apr-2014 Alan Somers <asomers@FreeBSD.org>

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.

sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.

sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.

Clear expected failure messages for kern/187550 and kern/187552.

PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic


# 0489b891 24-Apr-2014 Alan Somers <asomers@FreeBSD.org>

Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure

sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.

PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation


# 209579ae 17-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

For NFS mounts using rsize,wsize=65536 over TSO enabled
network interfaces limited to 32 transmit segments, there
are two known issues.
The more serious one is that for an I/O of slightly less than 64K,
the net device driver prepends an ethernet header, resulting in a
TSO segment slightly larger than 64K. Since m_defrag() copies this
into 33 mbuf clusters, the transmit fails with EFBIG.
A tester indicated observing a similar failure using iSCSI.

The second less critical problem is that the network
device driver must copy the mbuf chain via m_defrag()
(m_collapse() is not sufficient), resulting in measurable overhead.

This patch reduces the default size of if_hw_tsomax
slightly, so that the first issue is avoided.
Fixing the second issue will require a way for the
network device driver to inform tcp_output() that it
is limited to 32 transmit segments.

Reported and tested by: csforgeron@gmail.com, markus.gebert@hostpoint.ch
MFC after: 2 weeks


# d565e5f2 10-Apr-2014 Bruce M Simpson <bms@FreeBSD.org>

In if_freemulti(), relax the paranoid KASSERT() on ifma->ifma_protospec.

This KASSERT() existed as a sanity check that upper layers in the network
stack (e.g. inet, inet6) had released their reference to the underlying
driver's multicast memberships (ifmultiaddr{}). However it assumes the
lifecycle of the driver membership corresponds to the lifecycle of the
network layer membership.

In the submitter's case, ieee80211_ioctl_updatemulti() attempts to
reprogram the (parent, physical) ifnet{} memberships in response
to a change in membership on the (child, virtual) VAP ifnet, using
a batched update mechanism. These updates happen independently from
the network layer, causing a "false negative" assertion failure.

There are possibly other use cases where this KASSERT() may be triggered
by other networking stack activity (e.g. where a nesting relationship
exists between multiple ifnet{} instances). This suggests that further
review of FreeBSD's approach to nested ifnet relationships is needed.

MFC after: 6 weeks
Submitted by: adrian@


# 95fbe4d0 18-Jan-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify filling sockaddr_dl structure for if_resolvemulti()
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
with unmodified if_resolvemulti() callers).

MFC after: 2 weeks


# 955a2deb 07-Jan-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove dead code.

Reported by: Coverity
Coverity CID: 1018057
MFC after: 2 weeks


# 50da3e88 07-Jan-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Teach every SIOCGIFSTATUS provider to fill in ifs->ascii anyway.
Remove old bits of data concat for 'ascii' field.
Remove special SIOCGIFSTATUS handling from if.c (which Coverity yells at).

Reported by: Coverity
Coverity CID: 1147174
MFC after: 2 weeks


# 555036b5 10-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.


# af50ea38 04-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Axe IFF_SMART. Fortunately this layering violating flag was never used,
it was just declared.


# 5fb009bd 05-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Drop support for historic ioctls and also undefine them, so that code
that checks their presence via ifdef, won't use them.

Bump __FreeBSD_version as safety measure.


# 9a6356bc 05-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

In complemence to ifa_add_loopback_route() and ifa_del_loopback_route()
provide function ifa_switch_loopback_route() that will be used in case when
an interface address used for a loopback route goes away, but we have another
interface address with same address value and want to preserve loopback
route.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 7caf4ab7 15-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

- Utilize counter(9) to accumulate statistics on interface addresses. Add
four counters to struct ifaddr. This kills '+=' on a variables shared
between processors for every packet.
- Nuke struct if_data from struct ifaddr.
- In ip_input() do not put a reference on ifaddr, instead update statistics
right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9)
for every packet. [1]
- To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in
rtsock.c fill if_data fields using counter_u64_fetch().
- Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which
took if_data not from the ifaddr, but from ifaddr's ifnet. [2]

Submitted by: melifaro [1], pluknet[2]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 67420bda 15-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove ifa_mtx. It was used only in one place in kernel, and ifnet's
ifaddr lock can substitute it there.

Discussed with: melifaro, ae
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 46758960 15-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 1a05c762 10-Sep-2013 Dag-Erling Smørgrav <des@FreeBSD.org>

Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory. [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks. [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem. [SA-13:13]

Security: CVE-2013-5666
Security: FreeBSD-SA-13:11.sendfile
Security: CVE-2013-5691
Security: FreeBSD-SA-13:12.ifioctl
Security: CVE-2013-5710
Security: FreeBSD-SA-13:13.nullfs
Approved by: re


# 719fb725 14-Jul-2013 Craig Rodrigues <rodrigc@FreeBSD.org>

PR: 168520 170096
Submitted by: adrian, zec

Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.

(1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
device_attach(). This fixes multiple VIMAGE related kernel panics
when trying to attach Bluetooth or USB Ethernet devices because
curthread->td_vnet is NULL.

(2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking
interfaces, especially USB Ethernet devices.

(3) Use VNET_DOMAIN_SET() in ng_btsocket.c

(4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics
when detaching Netgraph nodes.


# 8aa99373 04-Jun-2013 John Baldwin <jhb@FreeBSD.org>

Fix build with both INET and INET6 disabled.


# 3c914c54 02-Jun-2013 Andre Oppermann <andre@FreeBSD.org>

Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore. The upper limit is still at
IP_MAXPACKET (65536 bytes). Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found. When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by: cperciva (earlier version)
Reviewed by: cperciva
Tested by: cperciva
MFC after: 1 week (using spare struct members to preserve ABI)


# f89d4c3a 06-May-2013 Andre Oppermann <andre@FreeBSD.org>

Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.


# 47e8d432 25-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Add const qualifier to the dst parameter of the ifnet if_output method.


# e8b3186b 09-Apr-2013 Andre Oppermann <andre@FreeBSD.org>

Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.


# 3034f43f 08-Mar-2013 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with: andre, eri
MFC after: 3 weeks


# 24421c1c 11-Feb-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master() - boolean function which is true if an address
is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
and is aware of CARP.

Utilize ifa_preferred() in ifa_ifwithnet().

The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc


# 813ee737 19-Oct-2012 Andre Oppermann <andre@FreeBSD.org>

Update to previous r241688 to use __func__ instead of spelled out function
name in log(9) message.

Suggested by: glebius


# 15134be8 18-Oct-2012 Andre Oppermann <andre@FreeBSD.org>

Use LOG_WARNING level in in_attachdomain1() instead of printf().

Submitted by: vijju.singh-at-gmail.com


# c9b652e3 18-Oct-2012 Andre Oppermann <andre@FreeBSD.org>

Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.


# d6d3f01e 08-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

o Fine grained locking, thus much better performance.
o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by: Florian Smeets <flo freebsd.org>
Tested by: Chekaluk Vitaly <artemrts ukr.net>
Tested by: Ben Wilber <ben desync.com>
Tested by: Ian FREISLICH <ianf cloudseed.co.za>


# 8db4528b 28-May-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Invoke group attach/detach handlers after releasing locks,
so that event subscribers could use M_WAITOK. This should reduce
lock contention as well.


# 7702d401 20-Apr-2012 Andrew Thompson <thompsa@FreeBSD.org>

Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.

Prodded by: glebius
Tested by: Alexander Lunev


# ddf32010 17-Apr-2012 Andrew Thompson <thompsa@FreeBSD.org>

Remove KASSERTS, they do not add any value here since the pointer is about to
be derefernced anyway.


# 4ecf274b 08-Feb-2012 Sergey Kandaurov <pluknet@FreeBSD.org>

g/c last bit of old ipv6 prefix management.

Reviewed by: bz
Obtained from: NetBSD, net/if.h, rev 1.80


# e8aa8bdd 04-Feb-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Fix typo in r231010.

Submitted by: linimon


# cad582b7 05-Feb-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Better comment for ifa_init(), ifa_ref(), ifa_free().


# adc7231d 05-Feb-2012 Gleb Smirnoff <glebius@FreeBSD.org>

In ifa_init() initialize if_data.ifi_datalen. This would be
required after upcoming changes from bz@.

Discussed with: bz


# 137f91e8 05-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks


# f08535f8 20-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Restore a feature that was present in 5.x and 6.x, and was cleared in
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.

However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:

- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.


# 08b68b0e 15-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


# f26fa169 09-Dec-2011 Brooks Davis <brooks@FreeBSD.org>

Remove the unused if_free_type() function.

X-MFC after: never


# c0ba290b 22-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Improve logging:
- don't hardcode function name
- use LOG_DEBUG for such a debug message
- print error value


# d745c852 06-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


# 35fd7bc0 02-Jul-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet


# 23519598 28-Jun-2011 Sergey Kandaurov <pluknet@FreeBSD.org>

Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.

PR: kern/158369
Reported by: Paul Procacci <pprocacci att gmail com>
MFC after: 1 week


# 4c506522 04-Apr-2011 Gleb Smirnoff <glebius@FreeBSD.org>

When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.

Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.


# e4cd31dd 21-Mar-2011 Jeff Roberson <jeff@FreeBSD.org>

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 1fb51a12 16-Feb-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.

While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.

The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec

Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks


# 0028e524 11-Feb-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Mfp4 CH=177255:

Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

Change the syntax to match KASSERT() to allow more flexible panic
messages rather than having a printf with hardcoded arguments
before panic.

Adjust the few assertions we have to the new format (and enhance
the output).

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb

MFC after: 2 weeks


# 5f3b301a 24-Jan-2011 John Baldwin <jhb@FreeBSD.org>

Fix a LOR by dropping the global ifnet locks while allocating a new ifnet
table in if_grow(). The order of the SYSINIT's for ifnet state were swapped
so that the various locks were initialized before being used.

Reviewed by: pluknet, bz
MFC after: 2 weeks


# f88910cd 12-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the net* piece.


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 31c6a003 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# a38de013 25-Oct-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Factor out DDB commands from r204145, r204279 into if_debug.c for further
enhancements (1). Switch to a standard 2-clause BSD license for this (2).

Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.

Suggested by: rwatson (1) a while back.
Approved by: thompsa (2) for the change from r204279.
MFC after: 6 days


# 9af74f3d 21-Oct-2010 Sergey Kandaurov <pluknet@FreeBSD.org>

Reshuffle SIOCGIFCONF32 handler from r155224.

- move all the chunks into one file, which allows to hide SIOCGIFCONF32
global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
doing internal 32bit specific changes within it (e.g. as it's done for
SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.

Approved by: kib (mentor)
MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 4d369413 10-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.


# d3c351c5 13-Aug-2010 Marko Zec <zec@FreeBSD.org>

When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by: julian
MFC after: 3 days


# 9963e8a5 11-Aug-2010 Will Andrews <will@FreeBSD.org>

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


# 54bfbd51 10-Aug-2010 Will Andrews <will@FreeBSD.org>

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# cd292f12 27-Jul-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Return NULL rather than 0 for a pointer.

MFC after: 3 days


# dd62f5c0 25-Jun-2010 Qing Li <qingli@FreeBSD.org>

MFC r208553

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

Approved by: re (bz)


# 0ed6142b 25-May-2010 Qing Li <qingli@FreeBSD.org>

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days


# 7c61d493 24-May-2010 Andrew Thompson <thompsa@FreeBSD.org>

MFC r202588

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev

MFC r202611

Do not hold the lock over if_setlladdr() as it calls into the interface driver
init routine.


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# e50d35e6 03-May-2010 Maxim Sobolev <sobomax@FreeBSD.org>

Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after: 1 month


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# 0919a8fb 27-Apr-2010 Xin LI <delphij@FreeBSD.org>

MFC r206637:

When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland. However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by: bschmidt


# 1ed532bb 21-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r206470:

In if_detach_internal() we cannot hold the af_data lock over the
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.

Discussed with: rwatson


# b0cf9f5f 21-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r206469:

In if_detach_internal() only try to do the detach run if if_attachdomain1()
has actually succeeded to initialize and attach. There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.

Discussed with: rwatson


# 57d84848 14-Apr-2010 Xin LI <delphij@FreeBSD.org>

When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland. However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by: bschmidt
MFC after: 2 weeks


# d8c13659 11-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

In if_detach_internal() we cannot hold the af_data lock over the
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.

Discussed with: rwatson
MFC after: 10 days


# 318c3213 11-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

In if_detach_internal() only try to do the detach run if if_attachdomain1()
has actually succeeded to initialize and attach. There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.

Discussed with: rwatson
MFC after: 10 days


# 1386abc0 27-Mar-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r204279:

Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
in the db_show_all_table as 'show all ifnets' and with that follow the
convention for showing complete lists.

Submitted by: thompsa


# 0519db72 27-Mar-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r204145:

Start to implement ifnet DDB support:
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.

We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].

Suggested by: rwatson [1]
Reviewed by: rwatson


# 9bdad327 27-Mar-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r204142:

Enhance a panic string to contain more useful debugging information.


# a5a931b3 25-Feb-2010 Xin LI <delphij@FreeBSD.org>

MFC 203052:

Add interface description capability as inspired by OpenBSD. Thanks for
rwatson@, jhb@, brooks@ and others for feedback to the old implementation!

Sponsored by: iXsystems, Inc.


# 7405f23c 24-Feb-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
in the db_show_all_table as 'show all ifnets' and with that follow the
convention for showing complete lists.

Submitted by: thompsa
MFC after: 3 days


# c9fdacda 20-Feb-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Start to implement ifnet DDB support:
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.

We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].

Sponsored by: ISPsystem
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 5 days


# 58606037 20-Feb-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Enhance a panic string to contain more useful debugging information.

Sponsored by: ISPsystem
Reviewed by: rwatson
MFC after: 5 days


# 3ddba633 31-Jan-2010 Shteryana Shopova <syrinx@FreeBSD.org>

MFC r202935:
While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaulo, bms


# 215940b3 26-Jan-2010 Xin LI <delphij@FreeBSD.org>

Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


# 93ec7edc 24-Jan-2010 Shteryana Shopova <syrinx@FreeBSD.org>

While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaolo, bms
MFC after: 1 week


# 52c240aa 22-Jan-2010 Brooks Davis <brooks@FreeBSD.org>

MFC r201350:

The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175. Remove all definitions, documentation, and usage.

The change of function signature for vlan_link_state() was not merged to
maintain the ABI.


# ea4ca115 18-Jan-2010 Andrew Thompson <thompsa@FreeBSD.org>

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


# 02bcb7ec 05-Jan-2010 John Baldwin <jhb@FreeBSD.org>

MFC 201196:
Change vlan interfaces to cope more usefully with the parent interface being
renamed. Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed. Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
renamed. This flag can be checked in ifnet departure/arrival event
handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
ignore departure events due to a trunk interface being renamed.


# a6fffd6c 31-Dec-2009 Brooks Davis <brooks@FreeBSD.org>

The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175. Remove all definitions, documentation, and usage.

fifo_misc.c:
Remove all kqueue tests as fifo_io.c performs all those that
would have remained.

Reviewed by: rwatson
MFC after: 3 weeks
X-MFC note: don't change vlan_link_state() function signature


# 5428776e 29-Dec-2009 John Baldwin <jhb@FreeBSD.org>

Change vlan interfaces to cope more usefully with the parent interface being
renamed. Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed. Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
renamed. This flag can be checked in ifnet departure/arrival event
handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
ignore departure events due to a trunk interface being renamed.

Reviewed by: brooks, rwatson
MFC after: 1 week


# 34605f85 30-Nov-2009 John Baldwin <jhb@FreeBSD.org>

Remove if_timer/if_watchdog now that they are no longer used. The space
used by if_timer is reserved for expanding if_index to an int in the
future.

Reviewed by: rwatson, brooks


# 1a9d4dda 12-Nov-2009 Xin LI <delphij@FreeBSD.org>

Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.


# 41c8c6e8 11-Nov-2009 Xin LI <delphij@FreeBSD.org>

Add interface description capability as inspired by OpenBSD.

MFC after: 3 months


# 8cb7f8f8 20-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r197364

A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.

Submitted by: noted by bz
Reviewed by: hrs
Approved by: re (kib)


# 46e7f983 20-Sep-2009 Qing Li <qingli@FreeBSD.org>

A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.

Submitted by: noted by bz
Reviewed by: hrs
MFC after: immediately


# 553a7dec 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r197227

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
Approved by: re


# 9bb7d0f4 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


# d6f7f21c 28-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge r196559 from head to stable/8:

Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.

Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().

Reviewed by: bz

Approved by: re (kib)


# 57d231bb 28-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge r196553 from head to stable/8:

Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.

Reviewed by: bz

Approved by: re (kib)


# b569420a 28-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge r196510 from head to stable/8:

Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

Approved by: re (kib)


# 3ef94f2b 28-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge r196481 from head to stable/8:

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian

Approved by: re (kib)


# d6976e05 28-Aug-2009 Marko Zec <zec@FreeBSD.org>

MFC r196504:

When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.

This change affects only options VIMAGE builds.

Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)

Approved by: re (rwatson)


# ed2dabfc 26-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.

Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().

Reviewed by: bz
MFC after: 3 days


# 61f6986b 25-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.

Reviewed by: bz
MFC after: 3 days


# 8e937462 23-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

MFC after: 3 days


# 52db6805 24-Aug-2009 Marko Zec <zec@FreeBSD.org>

When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.

This change affects only options VIMAGE builds.

Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)


# 77dfcdc4 23-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


# d326ff49 14-Aug-2009 Marko Zec <zec@FreeBSD.org>

MFC r196230:

Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.

Approved by: re (rwatson), julian (mentor)

Approved by: re (rwatson)


# 9abb4862 14-Aug-2009 Marko Zec <zec@FreeBSD.org>

Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.

Approved by: re (rwatson), julian (mentor)


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# be31e5e7 26-Jul-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls
(ifconfig ifN (-)vnet <jname|jid>) work correctly.

Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.

In the reclaim case, correctly set the vnet before calling if_vmove.

Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)

There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.

Suggested by: rwatson (*)
Approved by: re (kib)


# d0728d71 23-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.

Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)


# 006e9db4 19-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by: bz
Approved by: re (kensmith, kib)


# 5ee847d3 19-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by: bz
Approved by: re (kib)


# 7afcbc18 17-Jul-2009 Jamie Gritton <jamie@FreeBSD.org>

Remove the interim vimage containers, struct vimage and struct procg,
and the ioctl-based interface that supported them.

Approved by: re (kib), bz (mentor)


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 6cb7f168 29-Jun-2009 Brooks Davis <brooks@FreeBSD.org>

Remove support for the /dev/net/* per-interface devices. They serve
little purpose and are unused in the base system.

The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.

Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.

Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.

Reviewed by: rwatson
Discussed with: bz, zec
Approved by: re@ (kensmith)


# 395cbe82 27-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unnecessary include of kdb.h that snuck in during ifaddr refcount
work.

Reported by: pluknet <pluknet at gmail.com>
Approved by: re (kib)


# f9ef96ca 25-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.

We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.

MFC after: 6 weeks


# 3baaf297 24-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

In if_setlladdr(), use IF_ADDR_LOCK() and ifaddr references to improve
the safety of link layer address manipulation.

MFC after: 6 weeks


# 8c0fec80 23-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# a877d0cf 23-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove duplicate #include <net/route.h> from the middle of the file.


# b58ea5f3 22-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Move virtualization of routing related variables into their own
Vimage module, which had been there already but now is stateful.

All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.

Add a missing function local variable in case of MPATHing.

Reviewed by: zec


# 8896f83a 22-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after: 3 weeks


# bed56bb5 22-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

After the update to fxp(4) in r194573 we should no longer need
this DELAY(100) hack introduced in r56938.

Thanks to: yongari
MFC after: 6 weeks
X-MFC note: not before the fxp(4) changes


# 1099f828 21-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


# e40bae9a 21-Jun-2009 Roman Divacky <rdivacky@FreeBSD.org>

Switch cmd argument to u_long. This matches what if_ethersubr.c does and
allows the code to compile cleanly on amd64 with clang.

Reviewed by: rwatson
Approved by: ed (mentor)


# d659538f 15-Jun-2009 Sam Leffler <sam@FreeBSD.org>

r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by: jhb, kmacy


# 679e1390 15-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail. Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by: zec, julian
Approved by: bz (mentor)


# 259d2d54 11-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

carp(4) allows people to share a set of IP addresses and can only
use IPv4/v6 for inter-node communication (according to my reading).

Properly wrap the carp callouts in INET || INET6 and refelect this
in sys/conf/files as well. While in theory this should be ok,
it might be a bit optimistic to think that carp could build with
inet6 only[1].

Discussed with: mlaier [1]


# d8b0556c 10-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland


# 8d8bc018 08-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# bc29160d 08-Jun-2009 Marko Zec <zec@FreeBSD.org>

Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# c9dd3717 02-Jun-2009 Sam Leffler <sam@FreeBSD.org>

move ifq_detach from if_detach to if_free; this permits callers to
reference if_snd in the period between detach+free which helps simplify
detach code

Reviewed by: jhb, rwatson


# c2c2a7c1 01-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by: julian, rwatson, zec
X-MFC: not possible


# feb08d06 30-May-2009 Marko Zec <zec@FreeBSD.org>

Introduce an interm userland-kernel API for creating vnets and
assigning ifnets from one vnet to another. Deletion of vnets is not
yet supported.

The interface is implemented as an ioctl extension so that no syscalls
had to be introduced. This should be acceptable given that the new
interface will be used for a short / interim period only, until the
new jail management framwork gains the capability of managing vnets.
This method for managing vimages / vnets has been in use for the past
7 years without any observable issues.

The userland tool to be used in conjunction with the interim API can be
found in p4: //depot/projects/vimage-commit2/src/usr.sbin/vimage/... and
will most probably never get commited to svn.

While here, bump copyright notices in kern_vimage.c and vimage.h to
cover work done in year 2009.

Approved by: julian (mentor)
Discussed with: bz, rwatson


# 67da1f3d 22-May-2009 Marko Zec <zec@FreeBSD.org>

Set ifp->if_afdata_initialized to 0 while holding IF_AFDATA_LOCK on ifp,
not after the lock has been released.

Reviewed by: bz
Discussed with: rwatson


# e0c14af9 22-May-2009 Marko Zec <zec@FreeBSD.org>

Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another. Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by: bz, rwatson, brooks?
Approved by: julian (mentor)


# 21ca7b57 05-May-2009 Marko Zec <zec@FreeBSD.org>

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


# f6dfe47a 30-Apr-2009 Marko Zec <zec@FreeBSD.org>

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


# 8bd015a1 23-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

As with ifnet_byindex_ref(), don't return IFF_DYING interfaces from
ifunit_ref(). ifunit() continues to return them.

MFC after: 3 weeks


# 6064c5d3 23-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Add ifunit_ref(), a version of ifunit(), that returns not just an
interface pointer, but also a reference to it.

Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.

This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.

MFC after: 3 weeks


# 111c6b61 23-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

During if_detach(), invoke if_dead() to set the ifnet's function
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.

Reviewed by: bms
MFC after: 3 weeks


# d6f157ea 23-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Move portions of data structure initialization from if_attach() to
if_alloc(), and portions of data structure destruction from if_detach()
to if_free(). These changes leave more of the struct ifnet in a
safe-to-access condition between alloc and attach, and between detach
and free, and focus on attach/detach as stack usage events rather than
data structure initialization.

Affected fields include the linkstate task queue, if_afdata lock,
address lists, kqueue state, and MAC labels. ifq_attach() ifq_detach()
are not moved as ifq_attach() may use a queue length set by the device
driver between if_alloc() and if_attach().

MFC after: 3 weeks


# 242a8e72 23-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Add a new interface flag, IFF_DYING, which is set when a device driver
calls if_free(), and remains set if the refcount is elevated. IF_DYING
skips the bit in the if_flags bitmask previously used by IFF_NEEDSGIANT,
so that an MFC can be done without changing which bit is used, as
IFF_NEEDSGIANT is still present in 7.x.

ifnet_byindex_ref() checks for IFF_DYING and returns NULL if it is set,
preventing new references from by acquired by index, preventing
monitoring sysctls from seeing it. Other lookup mechanisms currently
do not check IFF_DYING, but may need to in the future.

MFC after: 3 weeks


# 27d37320 21-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Start to address a number of races relating to use of ifnet pointers
after the corresponding interface has been destroyed:

(1) Add an ifnet refcount, ifp->if_refcount. Initialize it to 1 in
if_alloc(), and modify if_free_type() to decrement and check the
refcount.

(2) Add new if_ref() and if_rele() interfaces to allow kernel code
walking global interface lists to release IFNET_[RW]LOCK() yet
keep the ifnet stable. Currently, if_rele() is a no-op wrapper
around if_free(), but this may change in the future.

(3) Add new ifnet field, if_alloctype, which caches the type passed
to if_alloc(), but unlike if_type, won't be changed by drivers.
This allows asynchronous free's of the interface after the
driver has released it to still use the right type. Use that
instead of the type passed to if_free_type(), but assert that
they are the same (might have to rethink this if that doesn't
work out).

(4) Add a new ifnet_byindex_ref(), which looks up an interface by
index and returns a reference rather than a pointer to it.

(5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking
up the ifnet to global lists.

(6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release
the ifnet when done.

When this change is MFC'd, it will need to replace if_ispare fields
rather than adding new fields in order to avoid breaking the binary
interface. Once this change is MFC'd, if_free_type() should be
removed, as its 'type' argument is now optional.

This refcount is not appropriate for counting mbuf pkthdr references,
and also not for counting entry into the device driver via ifnet
function pointers. An rmlock may be appropriate for the latter.
Rather, this is about ensuring data structure stability when reaching
an ifnet via global ifnet lists and tables followed by copy in or out
of userspace.

MFC after: 3 weeks
Reported by: mdtancsa
Reviewed by: brooks


# ab5ed8a5 21-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Acquire the interface address list lock over some iterations over
if_addrhead. This closes some reader-writer races associated with
the address list.

MFC after: 2 weeks


# 7cc5b47f 16-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

export if_qflush for use by driver if_qflush routines
only set ifp->if_{transmit, qflush} if not already set
KASSERT that neither or both are set


# 4d79e3d5 15-Apr-2009 Marko Zec <zec@FreeBSD.org>

In the !VIMAGE_GLOBALS case, make sure not to call vnet_net_iattach()
both via the vnet_mod_register() framework and then directly, but only
once.

Reviewed by: bz
Approved by: julian (mentor)


# aee3056f 13-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

call default if_qflush on ifq if default method isn't used by the driver


# bfe1aba4 10-Apr-2009 Marko Zec <zec@FreeBSD.org>

Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized. In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw). Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module. The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on. Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first. The framework will panic if it detects any unresolved
dependencies before completing system initialization. Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run. In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container. IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by: bz
Approved by: julian (mentor)


# 8623f9fd 10-Apr-2009 Max Laier <mlaier@FreeBSD.org>

Follow up for r190895 It's not only the "all" group that is affected, but
all groups on the given interface.

PR: kern/130977, kern/131310
MFC after: 3 days (%vnet)


# 876d5c03 10-Apr-2009 Max Laier <mlaier@FreeBSD.org>

Remove interfaces from IFG_ALL on detach. This cures a couple of pf panics
when using the "self" keyword in tables or as ()-style host address and
fixes "ifconfig -g all" output.

PR: kern/130977, kern/131310
Submitted by: Mikolaj Golub
MFC after: 3 days


# 1ed81b73 06-Apr-2009 Marko Zec <zec@FreeBSD.org>

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


# a51f44a7 28-Mar-2009 Sam Leffler <sam@FreeBSD.org>

enable setting the mac address of 802.11 devices


# bc3977f1 20-Mar-2009 Jamie Gritton <jamie@FreeBSD.org>

Call the interface's if_ioctl from ifioctl(), if the protocol didn't
handle the ioctl. There are other paths that already call it, but this
allows for a non-interface socket (like AF_LOCAL which ifconfig now
uses) to use a broader class of interface ioctls.

Approved by: bz (mentor), rwatson


# e5adda3d 15-Mar-2009 Robert Watson <rwatson@FreeBSD.org>

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


# c0c9ea90 14-Mar-2009 Sam Leffler <sam@FreeBSD.org>

remove stray ;


# 33553d6e 27-Feb-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


# b89e82dd 05-Feb-2009 Jamie Gritton <jamie@FreeBSD.org>

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


# eb322a6f 23-Jan-2009 John Baldwin <jhb@FreeBSD.org>

Only start the if_slowtimo timer (which drives the if_watchdog methods of
network interfaces) if we have at least one interface with an if_watchdog
routine.

MFC after: 2 weeks


# 6241d13a 18-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

if_rtdel is always called with the RADIX_NODE_HEAD lock held


# d24c444c 17-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

add ifnet_byindex_locked to allow for use of IFNET_RLOCK


# c368cff7 16-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

avoid trying to acquire a shared lock while holding an exclusive lock
by making the ifnet lock acquisition exclusive


# 991f8615 16-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

convert ifnet and afdata locks from mutexes to rwlocks


# 6e6b3f7c 14-Dec-2008 Qing Li <qingli@FreeBSD.org>

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 40eb85e7 11-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Whitespace changes only - tabs must have been converted to spaces
somehow, when moving the code from p4 to svn.

Sponsored by: The FreeBSD Foundation


# 385195c0 10-Dec-2008 Marko Zec <zec@FreeBSD.org>

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 21b14a75 09-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

It does not make much sense to include net/route.h twice.
Remove one #include.


# 653735c4 09-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Add rwlock.h (and lock.h for that) to keep no-INET kernels compiling
after RADIX_NODE_HEAD_{,UN}LOCK() were added. Must have been "learned"
by pollution before (most likely: route.h -> radix.h -> rwlock.h)


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 413628a7 29-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


# 97021c24 26-Nov-2008 Marko Zec <zec@FreeBSD.org>

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 14443589 24-Nov-2008 Sam Leffler <sam@FreeBSD.org>

use consistent style


# db7f0b97 21-Nov-2008 Kip Macy <kmacy@FreeBSD.org>

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


# 44e33a07 19-Nov-2008 Marko Zec <zec@FreeBSD.org>

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 5a97c9d4 06-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Include if_arp.h for IFP2AC so that the netgraph parts in if.c
are happy even if compiled without INET or INET6.

MFC after: 2 months


# 1ede983c 23-Oct-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 6bfa9a2d 27-Sep-2008 Ed Schouten <ed@FreeBSD.org>

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


# d3ce8327 26-Sep-2008 Ed Schouten <ed@FreeBSD.org>

Remove unit2minor() use from kernel code.

When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by: kib


# f0c04221 24-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Make the checks for ptp interfaces in ifa_ifwithdstaddr() and
ifa_ifwithnet() look more similar by comparing the pointer to NULL
in both cases.

MFC after: 3 months


# 516993d4 19-Aug-2008 Andrew Thompson <thompsa@FreeBSD.org>

ifnet_setbyindex() is only used locally, go back to being static.


# ac957cd2 19-Aug-2008 Julian Elischer <julian@FreeBSD.org>

A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 02f4879d 26-Jun-2008 Robert Watson <rwatson@FreeBSD.org>

Introduce locking around use of ifindex_table, whose use was previously
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by: brooks


# d94ccb09 16-May-2008 Brooks Davis <brooks@FreeBSD.org>

The if_check() function performed three actions:
- verified that the ifp->if_snd.ifq_mtx was initalized for
all attached interfaces. This was pointless because it was
initalized for all interfaces in if_attach() so I've removed it.
- Checked that ifp->if_snd.ifq_maxlen is initalized and set it to
ifqmaxlen if unset. This makes more sense in if_attach() so
I moved it there.
- The first call of if_slowtimo(). Delete if_check() and call
if_slowtimo() directly from the SYSINIT().


# 8b07e49a 09-May-2008 Julian Elischer <julian@FreeBSD.org>

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# ae0615f6 19-Apr-2008 Brooks Davis <brooks@FreeBSD.org>

Delay the global registration of the struct ifnet in if_alloc() until after
we're certain the allocation will entierly succeed. This fixes a leak in a
fairly unlikely case.

Reported by: vijay singh <vijjus at rocketmail dot com>
MFC after: 1 week


# fb27dd1d 25-Mar-2008 Sam Leffler <sam@FreeBSD.org>

expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after: 3 weeks


# 237fdd78 16-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# b9175c45 07-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

Move IFF_NEEDSGIANT warning from if_ethersubr.c to if.c so it is displayed
for all network interfaces, not just ethernet-like ones.

Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.

Upgrade if_watchdog to a WARNING.


# 30d239bc 24-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 33d2bb9c 27-Jul-2007 Robert Watson <rwatson@FreeBSD.org>

First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
which keyed off debug_mpsafenet to determine various aspects of their own
locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
no-op's, as their entire behavior was determined by the value in
debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by: bz, jhb
Approved by: re (kensmith)


# a45cbf12 16-May-2007 Brooks Davis <brooks@FreeBSD.org>

Update the comments on if_alloc(), if_free(), if_free_type(), and
if_attach.

Remove a comment about pre-3.0 network drivers from if_attach().

Be a bit more consistant about whitespace near comments.


# 18242d3b 16-Apr-2007 Andrew Thompson <thompsa@FreeBSD.org>

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@


# b47888ce 09-Apr-2007 Andrew Thompson <thompsa@FreeBSD.org>

Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.


# 75ae0c01 27-Mar-2007 Bruce M Simpson <bms@FreeBSD.org>

Fix a case where hardware removal of an interface caused an attempt to
announce an ll_ifma which has gone away. Add a KASSERT to catch regressions.

Bug found by: Tom Uffner


# 5896d124 19-Mar-2007 Bruce M Simpson <bms@FreeBSD.org>

Fix tinderbox; ng_ether needs to see if_findmulti().


# ec002fee 19-Mar-2007 Bruce M Simpson <bms@FreeBSD.org>

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


# 40d8a302 21-Feb-2007 Bruce M Simpson <bms@FreeBSD.org>

Fix a bug in if_findmulti(), whereby it would not find (and thus delete)
a link-layer multicast group membership.
Such memberships are needed in order to support protocols such as
IS-IS without putting the interface into PROMISC or ALLMULTI modes.

sa_equal() is not OK for comparing sockaddr_dl as it has deeper structure
than a simple byte array, so add sa_dl_equal() and use that instead.

Reviewed by: rwatson
Verified with: /usr/sbin/mtest
Bug found by: Jouke Witteveen
MFC after: 2 weeks


# c18ffdc8 30-Nov-2006 Gleb Smirnoff <glebius@FreeBSD.org>

The recent issues with em(4) interface has shown that the old 4.4BSD
if_watchdog/if_timer interface doesn't fit modern SMP network
stack design.

Device drivers that need watchdog to monitor their hardware should
implement it theirselves.

Eventually the if_watchdog/if_timer API will be removed. For now,
warn that driver uses it.

Reviewed by: scottl


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 773725a2 06-Sep-2006 Andre Oppermann <andre@FreeBSD.org>

Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR: kern/99558
Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days


# 6b7330e2 09-Jul-2006 Sam Leffler <sam@FreeBSD.org>

Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by: arch@


# 4b97d7af 29-Jun-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by: glebius, jhb, ru


# 457f48e6 21-Jun-2006 Gleb Smirnoff <glebius@FreeBSD.org>

- First initialize ifnet, and then insert it into global
list.
- First remove from global list, then start destroying.

PR: kern/97679
Submitted by: Alex Lyashkov <shadow itt.net.ru>
Reviewed by: rwatson, brooks


# 0dad3f0e 19-Jun-2006 Max Laier <mlaier@FreeBSD.org>

Import interface groups from OpenBSD. This allows to group interfaces in
order to - for example - apply firewall rules to a whole group of
interfaces. This is required for importing pf from OpenBSD 3.9

Obtained from: OpenBSD (with changes)
Discussed on: -net (back in April)


# affcaf78 11-Jun-2006 Max Khon <fjoe@FreeBSD.org>

Fix KASSERT conditions in if_deregister_com_alloc().


# f3b90d48 31-May-2006 Andrew Thompson <thompsa@FreeBSD.org>

Announce all interfaces to devd on attach/detach. This adds a new devctl
notification so all interfaces including pseudo are reported. When netif
creates the clones at startup devctl_disable has not been turned off yet so the
interfaces will not be initialised twice, enforce this by adding an explicit
order between rc.d/netif and rc.d/devd.

This change allows actions to taken in userland when an interface is cloned
and the pseudo interface will be automatically configured if a ifconfig_<int>=""
line exists in rc.conf.

Reviewed by: brooks
No objections on: net


# 93a69f57 21-Mar-2006 Gleb Smirnoff <glebius@FreeBSD.org>

No direct call to carp_ifdetach() anymore. It is called by
event handler.

PR: kern/82908
Submitted by: Dan Lukes <dan obluda.cz>


# 19cf0498 02-Feb-2006 Paul Saab <ps@FreeBSD.org>

Implement SIOCGIFCONF for 32bit binaries.


# 75ee267c 30-Jan-2006 Gleb Smirnoff <glebius@FreeBSD.org>

Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.

In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>


# 83ec464f 23-Jan-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

Be consistent in checking ifa->ifa_addr for NULL.

Found by: Coverity Prevent (tm)
MFC after: 3 days


# 4a0d6638 11-Nov-2005 Ruslan Ermilov <ru@FreeBSD.org>

- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.


# d09ed26f 11-Nov-2005 Ruslan Ermilov <ru@FreeBSD.org>

- Make IFP2ENADDR() a pointer to IF_LLADDR() rather than another
copy of Ethernet address.

- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
address as an argument, similar to ether_ifattach(), to make
this work.


# b5c8bd59 02-Oct-2005 Yaroslav Tykhiy <ytykhiy@gmail.com>

Clean up consistency checks in if_setflag():
. use KASSERT for all checks so that the source of an error can be detected;
. use __func__ instead of spelling function name each time;
. fix a typo.


# 7aebc5e8 02-Oct-2005 Yaroslav Tykhiy <ytykhiy@gmail.com>

Log a message about entering or leaving permanently promiscuous mode,
as it is done for usual promiscuous mode already. This info is important
because promiscuous mode in the hands of a malicious party can jeopardize
the whole network.


# b1c53bc9 18-Sep-2005 Robert Watson <rwatson@FreeBSD.org>

Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after: 3 days


# 0a53be46 12-Sep-2005 Robert Watson <rwatson@FreeBSD.org>

In netkqfilter(), return EINVAL instead of 1 (EPERM) when a filter type
is requested on a network interface file descriptor that is non-applicable.

MFC after: 3 days


# 62313e4c 04-Sep-2005 Sam Leffler <sam@FreeBSD.org>

reclaim sbuf and clear lock on error in ifconf

Submitted by: Ted Unangst
Reviewed by: rwatson
MFC after: 3 days


# dc7c539e 18-Aug-2005 Brooks Davis <brooks@FreeBSD.org>

When we started calling if_findindex() from if_alloc() with an empty
struct ifnet most of if_findindex() become a complex no-op. Remove it
and replace it with a corrected version of the four line for loop it
devolved to plus some error handling. This should probably be replaced
with subr_unit at some point.

Switch from checking ifaddr_byindex to ifnet_byindex when looking for
empty indexes. Since we're doing this from if_alloc/if_free, we can
only be sure that ifnet_byindex will be correct. This fixes panics when
loading the ef(4) module. The panics were caused by the fact that
if_alloc was called four time before if_attach was called and thus
ifaddr_byindex was not set and the same unit was allocated again. This
in turn caused the first if_attach to fail because the ifp was not the
one in ifnet_byindex(ifp->if_index).

Reported by: "Wojciech A. Koszek" <dunstan at freebsd dot czest dot pl>
PR: kern/84987
MFC After: 1 day


# 7cf30146 16-Aug-2005 Brooks Davis <brooks@FreeBSD.org>

- Move IF_ADDR_LOCK_DESTROY(ifp) from if_free to if_free_type.
- Add a note that additions should be made to if_free_type and not
if_free to help avoid this in the future.

This apparently fixes a use after free in if_bridge and may fix bugs
in other direct if_free_type consumers.

Reported by: thompsa


# 292ee7be 09-Aug-2005 Robert Watson <rwatson@FreeBSD.org>

Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by: pjd, bz
MFC after: 7 days


# 456d182d 06-Aug-2005 Sam Leffler <sam@FreeBSD.org>

destroy lock _before_ free'ing the structure it resides in


# 6da3131a 04-Aug-2005 John Baldwin <jhb@FreeBSD.org>

Initialize the if_addr mutex in if_alloc() rather than waiting until
if_attach(). This allows ethernet drivers to use it in their routines
to program their MAC filters before ether_ifattach() is called (de(4) is
one such driver). Also, the if_addr mutex is destroyed in if_free()
rather than if_detach(), so there was another potential bug in that a
driver that failed during attach and called if_free() without having
called ether_ifattach() would have tried to destroy an uninitialized mutex.

Reported by: Holm Tiffe holm at freibergnet dot de
Discussed with: rwatson


# c3b31afd 02-Aug-2005 Robert Watson <rwatson@FreeBSD.org>

Protect link layer network interface multicast address list manipulation
using ifp->if_addr_mtx:

- Initialize if_addr_mtx when ifnet is initialized.

- Destroy if_addr_mtx when ifnet is torn down.

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.

- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.

- Extract ifmultiaddr tear-down and deallocation in if_freemulti().

- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.

- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.

- De-spl all and sundry.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week


# 2432c31c 19-Jul-2005 Robert Watson <rwatson@FreeBSD.org>

In multicast routines:

Compare pointers with NULL rather than treating them as booleans.

Compare pointers with NULL rather than 0 to make it more clear
they are pointers.

Assign pointers value of NULL rather than 0 to make it more clear
they are pointers.

MFC after: 3 days


# d8d5b10e 19-Jul-2005 Robert Watson <rwatson@FreeBSD.org>

Rename equal() macro to sa_equal(), which matches the definitions
of sa_equal() in other files, and makes it more clear what equal()
is comparing.

MFC after: 3 days


# 52023244 14-Jul-2005 Max Laier <mlaier@FreeBSD.org>

Move eventhandler for 'ifnet_departure_event' at the end of the progress.
Some of the (IPv6) cleanup functions send packets to inform peers of the
departure. These packets confused users of ifnet_departure_event (pf at the
moment).

PR: kern/80627
Tested by: Divacky Roman
MFC after: 1 week


# 1a3b6859 14-Jul-2005 Yaroslav Tykhiy <ytykhiy@gmail.com>

MFp4:

- Introduce a helper function if_setflag() containing the code common
to ifpromisc() and if_allmulti() instead of duplicating the code poorly,
with different bugs.
- Call ifp->if_ioctl() in a consistent way: always use more compatible C
syntax and check whether ifp->if_ioctl is not NULL prior to the call.

MFC after: 1 month


# 571dcd15 01-Jul-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


# 1436936a 17-Jun-2005 Brooks Davis <brooks@FreeBSD.org>

Spelling/grammer fixes in comment.

Reported by: Hans Petter Selasky <hselasky at c2i dot net>
Approved by: re (ifnet blanked)


# 28ef2db4 11-Jun-2005 Brooks Davis <brooks@FreeBSD.org>

Return NULL instead of a bogus pointer from if_alloc when if_com_alloc
fails.

Move detaching the ifnet from the ifindex_table into if_free so we can
both keep the sanity checks and actually delete the ifnets. [0]

Reported by: gallatin [0]
Approved by: re (blanket)


# fc74a9f9 10-Jun-2005 Brooks Davis <brooks@FreeBSD.org>

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


# 9d80a330 06-Jun-2005 Brooks Davis <brooks@FreeBSD.org>

Send link state change notifications to /dev/devctl. This is needed to
start the OpenBSD dhclient when links come up.


# 8f867517 04-Jun-2005 Andrew Thompson <thompsa@FreeBSD.org>

Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by: mlaier (mentor)
Obtained from: NetBSD


# 45778b37 25-May-2005 Peter Edwards <peadar@FreeBSD.org>

Separate out address-detaching part of if_detach into if_purgeaddrs,
so if_tap doesn't need to rely on locally-rolled code to do same.

The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.

Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)


# 68a3482f 20-Apr-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Do not call all link state callbacks directly, but schedule
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.

Sponsored by: Rambler
Reviewed by: sam, brooks, mux


# fbd24c5e 14-Apr-2005 Colin Percival <cperciva@FreeBSD.org>

Zero the ifr.ifr_name buffer in ifconf() in order to avoid
accidental disclosure of kernel memory to userland.

Security: FreeBSD-SA-05:04.ifconf


# d4d22970 20-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

ifma_protospec is a pointer. Use NULL when assigning or compating it.


# 5515c2e7 11-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Add a sysctl net.link.log_link_state_change, which allows to
suppress logging of interface link state changes.

Requested by: sam, kan


# bc9d2991 25-Feb-2005 Brooks Davis <brooks@FreeBSD.org>

Change the definition of struct if_data's member ifi_epoch from wall
clock time to uptime because wall clock time may go backwards.

This is a change in the API which will impact SNMP agents who are using
ifi_epoch to set RFC2233's ifCounterDiscontinuityTime. None are know to
exist today. This will not impact applications that are using the
<index, epoch> tuple to verify interface uniqueness except that it
eliminates a race which could lead to a false assumption of uniqueness.

Because this is a behavior change, bump __FreeBSD_version.

Discussed with: re (jhb, scottl)
MFC after: 3 days
Pointed out by: pkh (way back at EuroBSDCon)
Pointy hat: brooks


# 8b25904e 22-Feb-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Typo in comment.


# 4d96314f 22-Feb-2005 Gleb Smirnoff <glebius@FreeBSD.org>

- In if_link_state_change() extract function body from if-block, to improve
readability.
- Call carp_carpdev_state() from if_link_state_change() if interface has
associated CARP interface.

Sponsored by: Rambler


# a9771948 22-Feb-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


# b0b4b28b 12-Feb-2005 Xin LI <delphij@FreeBSD.org>

Validate ifc->ifc_len before submitting its incarnation to sbuf_new,
which will finally lead to kernel panic.

Security: This prevents a local (root-launched) DoS
Submitted by: Wojciech A. Koszek [dunstan at freebsd czest pl]
PR: 77421
MFC After: 1 week


# 8b02df24 29-Jan-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Log changes of link state.

Reviewed by: rwatson


# 1c7899c7 07-Jan-2005 Gleb Smirnoff <glebius@FreeBSD.org>

This change adds reliability for Ethernet trunks built with ng_one2many:

- Introduce another ng_ether(4) callback ng_ether_link_state_p, which
is called from if_link_state_change(), every time link is changed.
- In ng_ether_link_state() send netgraph control message notifying
of link state change to a node connected to "lower" hook.

Reviewed by: sam
MFC after: 2 weeks


# c398230b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 94f5c9cf 07-Dec-2004 Sam Leffler <sam@FreeBSD.org>

Cleanup link state change notification:
o add new if_link_state_change routine that deals with link state changes
o change mii to use if_link_state_change


# 69fb23b7 30-Nov-2004 Max Laier <mlaier@FreeBSD.org>

Implement the check I was talking about in the previous message already.
Introduce domain_init_status to keep track of the init status of the domains
list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 =
initialized/done. Higher values can be used to support late addition of
domains which right now "works", but is potential dangerous. I choose to
only give a warning when doing so.

Use domain_init_status with if_attachdomain[1]() to ensure that we have a
complete domains list when we init the if_afdata array. Store the current
value of domain_init_status in if_afdata_initialized. This way we can update
if_afdata after a new protocol has been added (once that is allowed).

Submitted by: se (with changes)
Reviewed by: julian, glebius, se
PR: kern/73321 (partly)


# 6237419d 23-Nov-2004 Robert Watson <rwatson@FreeBSD.org>

Assign if_broadcastaddr to NULL not 0 in if_attach().

Printf() a warning if if_attachdomain() is called more than once on an
interface to generate some noise on mailing lists when this occurs.

Fix up style in if_start(), where spaces crept in instead of tabs at
some point.

MFC after: 1 week
MFC note: Not the printf().


# 0b762445 30-Oct-2004 Robert Watson <rwatson@FreeBSD.org>

Move if_handoff() from an inline in if_var.h to a function to if.c
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.

MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit


# 31302ebf 19-Oct-2004 Robert Watson <rwatson@FreeBSD.org>

Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.

MFC after: 3 days
Bumped into by: jmg


# 5ed8cedc 21-Sep-2004 Brian Feldman <green@FreeBSD.org>

Call sbuf_finish() before sbuf_data() so as to not panic the system.


# 4dcf2bbb 22-Sep-2004 Brooks Davis <brooks@FreeBSD.org>

Fix a LOR where ifconf() used copyout while holding a mutex. This LOR
was seen when configuring addresses on interfaces using ifconfig. This
patch has been verified to work with over eight thousand addresses
assigned to an interface.

LOR id: 031


# 71672bb6 17-Sep-2004 Brooks Davis <brooks@FreeBSD.org>

Log the renaming of an interface. This should make it easier to follow
kernel log files.


# 55287f2a 07-Sep-2004 Brooks Davis <brooks@FreeBSD.org>

Re-add ifi_epoch, to struct if_data, this time replacing ifi_unused
to avoid ABI changes. It is set to the last time the interface
counters were zeroed, currently the time if_attach() was called. It is
intentended to be a valid value for RFC2233's ifCounterDiscontinuityTime
and to make it easier for applications to verify that the interface they
find at a given index is the one that was there last time they looked.

Due to space constraints ifi_epoch is a time_t rather then a struct
timeval. SNMP would prefer higher precision, but this unlikely to be
useful in practice.


# 9b90387d 06-Sep-2004 John-Mark Gurney <jmg@FreeBSD.org>

don't call f_detach if the filter has alread removed the knote.. This
happens when a proc exits, but needs to inform the user that this has
happened.. This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...

MFC after: 5 days


# 4ff62bd9 01-Sep-2004 Brooks Davis <brooks@FreeBSD.org>

Back out ifi_epoch. The ABI breakage is too disruptive this close to
5-STABLE. ifi_epoch will shortly be reintroduced with less precistion
using the space currently allocated to ifi_unused.


# 7b21048c 01-Sep-2004 Max Laier <mlaier@FreeBSD.org>

Fix an assertion when if_down()ing a ALTQ managed interface. The lock should
have been in place all the time the mtx_assert in the ALTQ code just
discovered the shortcoming.

PR: i386/71195
Tested by: Bettan (PR originator), myself
MFC after: 5 days


# 9e734b44 01-Sep-2004 Brooks Davis <brooks@FreeBSD.org>

Use a spare byte in struct if_data to store the structure size without
increasing it. Add code to ifconfig to use this size to find the
sockaddr_dl after the struct if_data in the routing message. This
allows struct if_data to grow (up to 255 bytes) without breaking
ifconfig.

Submitted by: peter


# 1fc4519b 30-Aug-2004 Brooks Davis <brooks@FreeBSD.org>

Add a new variable, ifi_epoch, to struct if_data. It is set to the last
time the interface counters were zeroed, currently the time if_attach()
was called. It is indentended to be a valid value for RFC2233's
ifCounterDiscontinuityTime and to make it easier for applications to
verify that the interface they find at a given index is the one that was
there last time they looked.

An if_epoch "compatability" macro has not been created as ifi_epoch has
never been a member of struct ifnet.

Approved by: andre, bms, wollman


# b9907cd4 27-Aug-2004 Brooks Davis <brooks@FreeBSD.org>

When detaching an interface, don't leave an obsolete pointer to the
soon to be deleted struct ifnet around.

PR: kern/52260
MFC After: 3 days


# ad3b9257 15-Aug-2004 John-Mark Gurney <jmg@FreeBSD.org>

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 3f35d515 06-Aug-2004 Peter Pentchev <roam@FreeBSD.org>

Do not attempt to clean up data that has not been initialized yet.
This fixes two kernel panics on boot when the xl driver fails to
allocate bus/port/memory resources.

Reviewed by: silence on -net


# af5e59bf 27-Jul-2004 Robert Watson <rwatson@FreeBSD.org>

Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment. To do this, introduce if_start(), a
wrapper function for ifp->if_start(). If the interface can run MPSAFE,
it directly dispatches into the interface start routine. If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch. However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.


# 8bbfdc98 18-Jul-2004 Robert Watson <rwatson@FreeBSD.org>

Gratuitous whitespace change to un-wrap a short line.


# f889d2ef 22-Jun-2004 Brooks Davis <brooks@FreeBSD.org>

Major overhaul of pseudo-interface cloning. Highlights include:

- Split the code out into if_clone.[ch].
- Locked struct if_clone. [1]
- Add a per-cloner match function rather then simply matching names of
the form <name><unit> and <name>.
- Use the match function to allow creation of <interface>.<tag>
vlan interfaces. The old way is preserved unchanged!
- Also the match function to allow creation of stf(4) interfaces named
stf0, stf, or 6to4. This is the only major user visible change in
that "ifconfig stf" creates the interface stf rather then stf0 and
does not print "stf0" to stdout.
- Allow destroy functions to fail so they can refuse to delete
interfaces. Currently, we forbid the deletion of interfaces which
were created in the init function, particularly lo0, pflog0, and
pfsync0. In the case of lo0 this was a panic implementation so it
does not count as a user visiable change. :-)
- Since most interfaces do not need the new functionality, an family of
wrapper functions, ifc_simple_*(), were created to wrap old style
cloner functions.
- The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
instead.

Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by: andre, mlaier
Discussed on: net


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 4cb655c0 14-Jun-2004 Max Laier <mlaier@FreeBSD.org>

Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.


# 02b199f1 13-Jun-2004 Max Laier <mlaier@FreeBSD.org>

Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by: (i386)LINT


# 3fefbff0 24-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

arpcom untangling:

consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.

In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'


# f4247b59 19-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

Fix a recently introduced panic in if_detach() by delaying
the invalidation of ifindex_table[] entry. Probably this
code should be moved even further down, but for the time being
let's do it this way.


# 8614fb12 18-Apr-2004 Max Laier <mlaier@FreeBSD.org>

Make if_(un)route static in if.c as they are called from if_up/if_down only.
This is also cleanup to make locking easier.

Reviewed by: luigi
Approved by: bms(mentor)


# 9046571f 16-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

Use if_link instead of the alias if_list, and change a for() into
the TAILQ_FOREACH() form.

Comment the need to store the same info (mac address for ethernet-type
devices) in two different places.

No functional changes. Even the compiler output should be unmodified
by this change.


# 9b98ee2c 16-Apr-2004 Luigi Rizzo <luigi@FreeBSD.org>

Consistently use ifaddr_byindex() to access the link-level address
of an interface. No functional change.

On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed


# f36cfd49 07-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# bc1470f1 12-Mar-2004 Brooks Davis <brooks@FreeBSD.org>

Don't allow interfaces to be renamed to the empty string.
While I'm here, errors aren't bools.

Pointed out by: hmp


# 196f7f54 12-Mar-2004 Brooks Davis <brooks@FreeBSD.org>

Remove if_withname. It came in with the KAME import, but never got
used. Should someone need its functionality, it's a really expensive
implementation of:
ifnet_byindex(sdl->sdl_index)

Reviewed by: bde, ume


# 25a4adce 25-Feb-2004 Max Laier <mlaier@FreeBSD.org>

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


# dc08ffec 21-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


# 913e410e 20-Feb-2004 Yaroslav Tykhiy <ytykhiy@gmail.com>

Minor beautifications related to style(9) and code consistency.
No functional changes.


# efb4018b 20-Feb-2004 Yaroslav Tykhiy <ytykhiy@gmail.com>

Improve the SIOCSIFCAP handler a bit:
- allow for ifp->if_ioctl being NULL, as the rest of ifioctl() does;
- give the interface driver a chance to report a error to the caller;
- don't forget to update ifp->if_lastchange upon successful modification
of interface operation parameters.


# 36c19a57 03-Feb-2004 Brooks Davis <brooks@FreeBSD.org>

Add the kernel side of network interface renaming support.

The basic process is to send a routing socket announcement that the
interface has departed, change if_xname, update the sockaddr_dl
associated with the interface, and announce the arrival of the interface
on the routing socket.

As part of this change, ifunit() is greatly simplified by testing
if_xname directly. if_clone_destroy() now uses if_dname to look up the
cloner for the interface and if_dunit to identify the unit number.

Reviewed by: ru, sam (concept)
Vincent Jardin <vjardin AT free.fr>
Max Laier <max AT love2party.net>


# ccb82468 02-Feb-2004 Brooks Davis <brooks@FreeBSD.org>

More macro cleanup. Use the system roundup2() macro instead of making
our own ROUNDUP() macro.

Suggested by: bde


# a8773564 27-Jan-2004 Brooks Davis <brooks@FreeBSD.org>

Cleanup malloc() use in if_attach():
- malloc() returns a void* and does not need a cast
- when called with M_WAITOK, malloc() can not return NULL so don't
check for that case. The result of the check was bogus anyway since
it would leave the interface broken.


# 8abaf585 26-Jan-2004 Brooks Davis <brooks@FreeBSD.org>

Clean up macro usage in if_attach():
- Use the system offsetof macro rather then making out own.
- undef ROUND after we use it rather then polluting the whole file.


# 12b8b80e 23-Jan-2004 Ruslan Ermilov <ru@FreeBSD.org>

Don't panic if there are more than 255 interfaces in the system.


# 5d7252af 26-Dec-2003 Brian Feldman <green@FreeBSD.org>

Don't truncate the interface name in ifunit(). It's now possible to query
"very long interface names", e.g.:
ndis_atheros0: flags=8847<UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST> mtu 1500


# 9bf40ede 31-Oct-2003 Brooks Davis <brooks@FreeBSD.org>

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


# 13fb40df 30-Oct-2003 Brooks Davis <brooks@FreeBSD.org>

Replace a couple printfs with if_printfs.


# 234a35c7 24-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot
use mutex lock directly here. Protect ifp->if_afdata instead.

Reported by: grehan


# 72fd1b6a 23-Oct-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Clean up whitespace, remove "register" keyword, ANSIfy.
No functional changes.


# e115574c 22-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

protect by IFNET_RLOCK.


# 31b1bfe1 17-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


# 212bd869 16-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

AF_LINK sockaddr has to be attached to ifp->if_addrlist until the
end, as many of the code assumes that TAILQ_FIRST(ifp->if_addrlist)
is non-null.

Submitted by: itojun


# d1dd20be 03-Oct-2003 Sam Leffler <sam@FreeBSD.org>

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


# ed692400 28-Sep-2003 Poul-Henning Kamp <phk@FreeBSD.org>

I don't know from where the notion that device driver should or
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.

If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.

Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.

The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.


# 89eaef50 19-Jul-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

Disabling multicast on vlan interface caused kernel panic.

PR: kern/40723
Submitted by: Hideki ONO <ono@kame.net>
MFC after: 1 week


# 51da11a2 29-Apr-2003 Mark Murray <markm@FreeBSD.org>

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


# 31566c96 20-Mar-2003 John Baldwin <jhb@FreeBSD.org>

Use td->td_ucred instead of td->td_proc->p_ucred.


# d42ee4e4 09-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Note that MAJOR_AUTO is now the default if d_maj is not initialized. This
is more robust and prevents the hijacking of /dev/console for the typical
mistake.

Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.


# 182a9f74 03-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Make nokqfilter() return the correct return value.

Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.


# 7ac40f5f 02-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


# 7e1f8a0b 28-Feb-2003 Maxime Henrion <mux@FreeBSD.org>

Make the network /dev entries use MAJOR_AUTO.


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 6cdcc159 23-Jan-2003 Max Khon <fjoe@FreeBSD.org>

- add support for IPX (tested with mount -t nwfs and mars_nwe),
IP fast forwarding, SIOCGIFADDR, setting hardware address (not currently
enabled in cm driver), multicasts (experimental)
- add ARC_MAX_DATA, use IF_HANDOFF, remove arc_sprintf() and some unused
variables
- if_simloop logic is made more similar to ethernet
- drop not ours packets early (if we are not in promiscous mode)

Submitted by: mark tinguely (partially)


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 956b0b65 23-Dec-2002 Jeffrey Hsu <hsu@FreeBSD.org>

SMP locking for radix nodes.


# b30a244c 21-Dec-2002 Jeffrey Hsu <hsu@FreeBSD.org>

SMP locking for ifnet list.


# 19fc74fb 18-Dec-2002 Jeffrey Hsu <hsu@FreeBSD.org>

Lock up ifaddr reference counts.


# 0f43e1aa 15-Nov-2002 Sam Leffler <sam@FreeBSD.org>

Back out rev 1.150; things are more complicated than this.


# 10ed96fd 15-Nov-2002 Sam Leffler <sam@FreeBSD.org>

if_attach should not sleep; change malloc's M_WAITOK to M_NOWAIT


# fa882e87 24-Sep-2002 Brooks Davis <brooks@FreeBSD.org>

Add a new helper function if_printf() modeled on device_printf(). The
function takes a struct ifnet pointer followed by the usual printf
arguments and prints "<interfacename>: " before the results of printf.
Since this is the primary form of printf calls in network device drivers
and accounts for most uses of the ifnet menber if_unit, this
significantly simplifies many printf()s.


# 6e82956c 19-Aug-2002 Juli Mallett <jmallett@FreeBSD.org>

Clean up a comment talking about C strings, which are terminated with the
ASCII NUL character (0, or '\0' in C).


# ffb079be 19-Aug-2002 Maxim Sobolev <sobomax@FreeBSD.org>

Implement user-setable promiscuous mode (a new `promisc' flag for ifconfig(8)).
Also, for all interfaces in this mode pass all ethernet frames to upper layer,
even those not addressed to our own MAC, which allows packets encapsulated
in those frames be processed with packet filters (ipfw(8) et al).

Emphatically requested by: Anton Turygin <pa3op@ukr-link.net>
Valuable suggestions by: fenner


# 62f76486 18-Aug-2002 Maxim Sobolev <sobomax@FreeBSD.org>

Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net


# 8f293a63 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Introduce two ioctls, SIOCGIFMAC, SIOCSIFMAC, which permit user
processes to manage the MAC labels on network interfaces. Note
that this is part of the user process API/ABI that will be revised
prior to 5.0-RELEASE.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# e70cd263 31-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the interface management code so that MAC labels are
properly maintained on network interfaces (struct ifnet). In
particular, invoke entry points when interfaces are created and
removed. MAC policies may initialized the label interface based
on a variety of factors, including the interface name.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 13990766 02-Jul-2002 Jonathan Mini <mini@FreeBSD.org>

Check retifma for NULL before using it.

PR: kern/9391
Submitted by: Assar Westerlund <assar@sics.se>
MFC after: 3 days


# ae5a19be 25-May-2002 Brooks Davis <brooks@FreeBSD.org>

Move all unit number management cloned interfaces into the cloning
code. The reverts the API change which made the <if>_clone_destory()
functions return an int instead of void bringing us into closer
alignment with NetBSD.

Reviewed by: net (a long time ago)


# 88ff5695 18-Apr-2002 SUZUKI Shinsuke <suz@FreeBSD.org>

just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by: ume
MFC after: 1 week


# d637e989 10-Apr-2002 Peter Wemm <peter@FreeBSD.org>

Add missing 'struct ifreq ifr;' that was forgotten in the last commit.


# ee0a4f7e 09-Apr-2002 SUZUKI Shinsuke <suz@FreeBSD.org>

fixed a kernel crash when enabling multicast on vlan interface
owing to a NULL argument to vlan_ioctl() at if_allmulti().

Reviewed by: ume
MFC after: 1 week


# 6008862b 04-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# c61cd599 01-Apr-2002 Hajimu UMEMOTO <ume@FreeBSD.org>

Make `route add -inet6 default ::1 -ifp gif0' work actually.
The change between 1.13 and 1.14 is specific to AF_INET.

MFC after: 1 week


# 929ddbbb 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 3b16e7b2 11-Mar-2002 Maxime Henrion <mux@FreeBSD.org>

Simplify the interface cloning framework by handling unit
unit allocation with a bitmap in the generic layer. This
allows us to get rid of the duplicated rman code in every
clonable interface.

Reviewed by: brooks
Approved by: phk


# 0346e973 05-Mar-2002 Brian Feldman <green@FreeBSD.org>

Use revoke_and_destroy_dev() instead of destroy_dev() when removing /dev/net
pseudo-devices when an interface goes away. Otherwise, an open /dev/net/foo0
when the interface is removed can cause a crash.

Not objected to by: jlemon


# b75496fe 04-Mar-2002 Brooks Davis <brooks@FreeBSD.org>

Change the network interface cloning API so the destroy function returns
an int errorcode instead of void in preperation for merging cloning of
the loopback device.

Submitted by: mux
MFC after: 2 weeks


# a854ed98 27-Feb-2002 John Baldwin <jhb@FreeBSD.org>

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# c0933269 25-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Fix a warning by pulling prototype for arp_ifinit() into scope.
Then fix cast the correct value into an incorrect value, which was not
detected due to the missing prototype (but was harmless anyway).


# b2c08f43 18-Feb-2002 Luigi Rizzo <luigi@FreeBSD.org>

When the local link address is changed, send out gratuitous ARPs
to notify other nodes about the address change. Otherwise, they
might try and keep using the old address until their arp table
entry times out and the address is refreshed.

Maybe this ought to be done for INET6 addresses as well but i have
no idea how to do it. It should be pretty straightforward though.

MFC-after: 10 days


# 7b6edd04 18-Jan-2002 Ruslan Ermilov <ru@FreeBSD.org>

Introduce an interface announcement message for the routing
socket so that routing daemons and other interested parties
know when an interface is attached/detached.

PR: kern/33747
Obtained from: NetBSD
MFC after: 2 weeks


# de593450 17-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add a SIOCGIFINDEX ioctl, which returns the index of a named interface.
This will be used to more efficiently support if_nametoindex(3).


# 10930aad 17-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Cleanup ifunit(), so it uses the dev_named() function to map an interface
name into a device.


# 8071913d 17-Oct-2001 Ruslan Ermilov <ru@FreeBSD.org>

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


# 66afbd68 17-Oct-2001 Ruslan Ermilov <ru@FreeBSD.org>

Revision 1.13 corresponded to CSRG revision 8.4.
Revision 1.59 corresponded to CSRG revision 8.5.


# 05153c61 16-Oct-2001 Bill Fenner <fenner@FreeBSD.org>

if_index is the highest interface index in the system, not the next
available index.


# 322dcb8d 14-Oct-2001 Max Khon <fjoe@FreeBSD.org>

bring in ARP support for variable length link level addresses

Reviewed by: jdp
Approved by: jdp
Obtained from: NetBSD
MFC after: 6 weeks


# d2b4566a 11-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Fix the ``WARNING: Driver mistake: repeat make_dev'', caused by using
the wrong index variable within a loop. I have no idea how this managed
to work on my test box.

Spotted by: fenner


# ffb5a104 10-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Move device nodes into a /dev/net/ directory, to avoid conflict with
existing devices (e.g.: tunX). This may need a little more thought.

Create a /dev/netX alias for devices. net0 is reserved.

Allow wiring of net aliases in /boot/device.hints of the form:
hint.net.1.dev="lo0"
hint.net.12.ether="00:a0:c9:c9:9d:63"


# 9a2a57a1 29-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add ability to attach knotes to network devices.
Introduce EVFILT_NETDEV to report network device changes.


# f13ad206 28-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce network device nodes. Network devices will now automatically
appear in /dev. Interface hardware ioctls (not protocol or routing) can
be performed on the descriptor. The SIOCGIFCONF ioctl may be performed
on the special /dev/network node.


# 016da741 18-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add two fields to the ifnet structure indicating what extra capabilities
a network device has, and which ones are enabled.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# a8637146 06-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Fix another shortcircuit return() statement that I missed.


# 2defe5cd 06-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Fix sense of comparison in space test. Also eliminate a compile
warning and remove a previously existing off-by-one error.


# f9132ceb 05-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# 0b59d917 05-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Cosmetic cleanups and rearrangement for code to come. There should be
no functional change in this commit.


# 30aad87d 02-Jul-2001 Brooks Davis <brooks@FreeBSD.org>

Add kernel infrastructure for network device cloning.

Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


# 33841545 10-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 4f3c11a6 27-Apr-2001 Bill Fenner <fenner@FreeBSD.org>

Better handling of ioctl(SIOCSIFFLAGS) failing in ifpromisc():
- Don't print the "promiscuous mode (enabled|disabled)" on failure
- Restore the reference count on failure


# b7bffa71 04-Apr-2001 Yaroslav Tykhiy <ytykhiy@gmail.com>

Change the type of the VLAN interface from IFT_PROPVIRTUAL,
which was a temporary hack, to IFT_L2VLAN, which is the type
assigned by IANA.


# 2c514a31 02-Apr-2001 Brian Somers <brian@FreeBSD.org>

If ifpromisc() fails the SIOCSIFFLAGS ioctl, put ifp->if_flags
back the way we found them.


# 5e980e22 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Use mtx_initiaalized() rather than violating the internals of the mutex
structure.


# ccb7cc8d 27-Mar-2001 Yaroslav Tykhiy <ytykhiy@gmail.com>

Don't bypass notifying a corresponding interface
when leaving a link-layer multicast group.

PR: kern/22176
Reviewed by: wollman


# 91421ba2 20-Feb-2001 Robert Watson <rwatson@FreeBSD.org>

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# 6817526d 06-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by: mikeh


# 37d40066 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


# fc2ffbe6 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


# 22f29826 03-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Use <sys/queue.h> macro api rather than fondle its implementation detals.

Created with: /usr/bin/sed
Reviewed by: /sbin/md5


# 62b119ca 30-Jan-2001 Jason Evans <jasone@FreeBSD.org>

Revert mutex initialization check to look at mtx_description.

Pointed out by: jlemon, jhb


# 0cde2e34 21-Jan-2001 Jason Evans <jasone@FreeBSD.org>

Move most of sys/mutex.h into kern/kern_mutex.c, thereby making the mutex
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.

Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).

Submitted by: peter


# 7cc0979f 08-Dec-2000 David Malone <dwmalone@FreeBSD.org>

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# df5e1987 25-Nov-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.


# db7e3af1 15-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded #include <machine/clock.h>


# 41b3e8e5 04-Oct-2000 Jun-ichiro itojun Hagino <itojun@FreeBSD.org>

make sure we have root priv on SIOCSIFPHY*. from thorpej@netbsd


# 66ce51ce 14-Aug-2000 Archie Cobbs <archie@FreeBSD.org>

Export the functionality of SIOCSIFLLADDR with if_setlladdr()
and add some more rigorous sanity checking in the process.

Reviewed by: freebsd-net


# 978ee2ed 15-Jul-2000 Jun-ichiro itojun Hagino <itojun@FreeBSD.org>

improve route/nd cache cleanup on interface removal.
CAVEAT: haven't really tested it yet, please report


# c38d75f4 30-Jun-2000 Archie Cobbs <archie@FreeBSD.org>

Previous commit didn't work; this time really fix it.


# 6ec86086 29-Jun-2000 Archie Cobbs <archie@FreeBSD.org>

Fix kernel build breakage when 'device ether' was not included.


# e1e1452d 26-Jun-2000 Archie Cobbs <archie@FreeBSD.org>

Make the ng_ether(4) node type dynamically loadable like the rest.
This means 'options NETGRAPH' is no longer necessary in order to get
netgraph-enabled Ethernet interfaces. This supports loading/unloading
the ng_ether.ko and attaching/detaching the Ethernet interface in any
order.

Add two new hooks 'upper' and 'lower' to allow access to the protocol
demux engine and the raw device, respectively. This enables bridging
to be defined as a netgraph node, if so desired.

Reviewed by: freebsd-net@freebsd.org


# b106252c 16-Jun-2000 Bill Paul <wpaul@FreeBSD.org>

Implement SIOCSIFLLADDR, which allows you to change the link-level
address on an interface. This basically allows you to do what my
little setmac module/utility does via ifconfig. This involves the
following changes:

socket.h: define SIOCSIFLLADDR
if.c: add support for SIOCSIFLLADDR, which resets the values in
the arpcom struct and sockaddr_dl for the specified interface.
Note that if the interface is already up, we need to down/up
it in order to program the underlying hardware's receive filter.
ifconfig.c: add lladdr command
ifconfig.8: document lladdr command

You can now force the MAC address on any ethernet interface to be
whatever you want. (The change is not sticky across reboots of course:
we don't actually reprogram the EEPROM or anything.) Actually, you
can reprogram the MAC address on other kinds of interfaces too; this
shouldn't be ethernet-specific (though at the moment it's limited to
6 bytes of address data).

Nobody ran up to me and said "this is the politically correct way to
do this!" so I don't want to hear any complaints from people who think
I could have done it more elegantly. Consider yourselves lucky I didn't
do it by having ifconfig tread all over /dev/kmem.


# d91a068e 21-Apr-2000 Guido van Rooij <guido@FreeBSD.org>

IOCGIFCONF once and for all. Sometimes the ifc_len variable
would be returned with a wrong value.
While we're here, get rid of unnecessary panic call.

PR: 17311, 12996, 14457
Submitted by: Patrick Bihan-Faou <patrick@mindstep.com>,
Kris Kennaway <kris@FreeBSD.org>


# b3f1e629 28-Feb-2000 Guido van Rooij <guido@FreeBSD.org>

This fixes a problem where the SIOCGIFCONF ioctl goes wrong. This
is triggered when qmail is used with INET6 enabled. The bug
manifests itself in that the space variable can become negative
and that in the comparison in the guards of the 2 loops, this was
not noticed because sizeof() returns an unsigned and thus the signed
variable gets promoted to unsigned. I decided not to make space
unsigned because I think we should guard against this from happening.
Thus panic() in case space becomes negative.

Approved by: jkh


# 3411310d 01-Feb-2000 Yoshinobu Inoue <shin@FreeBSD.org>

Add workaround for fxp issue at interface initialization with IPv6.

Some LAN card chip for fxp is known to cause problem at
interface initialization with IPv6 enabled. It happens at
some delicate timing.
And also, just adding some DELAY before IPv6 address
autoconfiguration is known to avoid the problem.

Complete fix is changing the driver not to use interrupt at
multicast filter initialization, but trying such change in
this stage will be dangerous.

So I add some DELAY() only inside #ifdef INET6 part,
as temporal workaround only for 4.0.

Approbed by: jkh

Noticed by: Mattias Pantzare <pantzer@ludd.luth.se>

Obtained from: openbsd-tech mailing list


# 48f71763 24-Jan-2000 Ruslan Ermilov <ru@FreeBSD.org>

Notify user processes about interface's MTU change.

Reviewed by: wollman, freebsd-net


# 0d0f9d1e 30-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

Prevent kernel panic at ifconfig up after Note PC resume.

Submitted by: imp, kuriyama
Reviewed by: imp


# 5500d3be 16-Dec-1999 Warner Losh <imp@FreeBSD.org>

Two more fixes to if_detach. These are generic to all interfaces and
do not pollute the interface further.

o Run if_detach at splnet().
o Creatively swipe the relevant parts of the netatm atm_nif_detach
which will delete the relevant references to the interface going
away.


# 8b7805e4 13-Dec-1999 Boris Popov <bp@FreeBSD.org>

Allow ifunit() routine to understand names like ed0f2. Also
fix a bug caused by using bcmp() instead of strcmp().

Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>


# aa6be122 10-Dec-1999 Warner Losh <imp@FreeBSD.org>

Add some gross ad-hock hacks to increase stability of if_detach:
o be more careful about clearing addresses (this isn't a kludge)
o For AF_INET interfaces, call SIOCDIFFADDR to remove last(?) bit
of cruft.

Special cases for AF_INET shouldn't be here, but I didn't see a good
generic way of doing this. If I missed something, please let me know.

This gross hack makes pccard ejection stable for ethernet cards.

Submitted by: Atushi Onoe-san


# cfa1ca9d 07-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 82cd038d 21-Nov-1999 Yoshinobu Inoue <shin@FreeBSD.org>

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# cd9e4cab 30-Aug-1999 Sheldon Hearn <sheldonh@FreeBSD.org>

For every "promiscuous mode enabled" message printed for an interface,
print a matching "disabled" message when we drop out of promiscuous
mode for that interface.

Discussed on the freebsd-hackers mailing list.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# aab3beee 06-Aug-1999 Brian Somers <brian@FreeBSD.org>

Define IF_MAXMTU and IF_MINMTU and don't allow MTUs with
out-of-range values.

``comparison is always 0'' warnings are silly !

Ok'd by: wollman, dg
Advised by: bde


# 413dd0ba 19-Jun-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Add a new interface ioctl, to return "aux status".

This is inteded for to allow ifconfig to print various unstructured
information from an interface.

The data is returned from the kernel in ASCII form, see the comment in
if.h for some technicalities.

Canonical cut&paste example to be found in if_tun.c

Initial use:
Now tun* interfaces tell the PID of the process which opened them.

Future uses could be (volounteers welcome!):
Have ppp/slip interfaces tell which tty they use.
Make sync interfaces return their media state: red/yellow/blue
alarm, timeslot assignment and so on.
Make ethernets warn about missing heartbeats and/or cables


# 2f55ead7 06-Jun-1999 Poul-Henning Kamp <phk@FreeBSD.org>

typo in previous commit


# cf4b9371 06-Jun-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce IFF_SMART bit.

This means that the driver will add/delete routes when it knows it is
up/down, rather than have the generic code belive it is up if configured.

This is probably most useful for serial lines, although many PHY chips
could probably tell us if we're connected to the cable/hub as well.


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 8ba5bdae 26-Apr-1999 Peter Wemm <peter@FreeBSD.org>

Protect the ifinit() function's internals with splimp() for safety since
it used to be that way. I'm not sure that it's needed, but it does
walk the ifp list..

Incidently, there's nothing to sanity check the ifq_maxlen on loaded
interfaces..


# 6182fdbd 16-Apr-1999 Peter Wemm <peter@FreeBSD.org>

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


# 4add131e 19-Feb-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Since ifru_flags is a short, we can fit in a copy of the flags
before they got changed. This can help eliminate much of the
gymnastics drivers do in their ioctl routines to figure this out.

Remove commented out IFF_NOTRAILERS


# e0ea20bc 01-Feb-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Print a message if the driver didn't initialize ifq_maxlen.
Drivers should be updated if they get flagged by this message.

(The reason this is important is because we do not have a way
to catch this mistake for interfaces added after ifinit() runs.)


# e8c2601d 16-Dec-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Generalize the if_up() and if_down() functions under the names
if_route() and if_unroute().

This is first step towards sanitizing IFF_UP and IFF_RUNNING


# 2127f260 04-Dec-1998 Archie Cobbs <archie@FreeBSD.org>

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


# c7323482 12-Aug-1998 Bill Paul <wpaul@FreeBSD.org>

One-liner: add a call to the underlying device driver's SIOCDELMULTI
ioctl() routine at the end of if_delmulti() so that interfaces with
hardware multicast filtering can update their filters in a timely
manner.

If the interface doesn't support hardware multicast filtering, then
reception of multicast frames is done using 'promiscious mode' or
'capture all multicast frames' mode and software filtering in the
kernel. In this case, it doesn't matter if if_delmulti() ever does
an SCIODELMULTI on the interface or not: if MULTICAST support is
enabled, then we join the 'all hosts' group when the interface is
configured, and remain in it until the interface is brought down.
Without hardware filtering, joining one group means joining all
groups, so it makes no difference if we call the SIOCDELMULTI
routine.

If the interface does support hardware multicast filtering, then
by not reprogramming the hardware filter in if_delmulti(), we have
to wait until somebody calls if_setmulti(), during which time the
interface is receiving frames for multicast groups in which we are
no longer interested.


# 8a261b8f 20-Jul-1998 Doug Rabson <dfr@FreeBSD.org>

Make sure the link level sockaddr size is rounded up correctly on alpha.


# 01f0fef3 08-Jun-1998 Julian Elischer <julian@FreeBSD.org>

Don't let ifunit() modify the string passed as an argument.
it may be in the text segment and write protected.


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 98b9590e 06-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Use getmicrotime() for if_lastchange, 10msec is plenty precision.


# 5591b823d 16-Dec-1997 Eivind Eklund <eivind@FreeBSD.org>

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 55b211e3 28-Oct-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# d7189ec6 07-Oct-1997 Joerg Wunsch <joerg@FreeBSD.org>

Ooops, this should have made it into the same commit, but didn't.

Introduce the SIOC[SG]IFGENERIC hooks that can be used to pass an
arbritrary ioctl subcommand into an interface driver. Surprisingly
enough, there was no provision for this already present (except of the
option of abusing SIOC[SG]IFMEDIA for this).

The idea is that an interface driver can establish ioctl subcommands
of its own that can't be meaningfully interpreted by the upper layer
interface ioctl function. Something like this is required to
implement a clean solution of passing down things like CHAP secrets or
PPP options to the /sys/net/if_sppp* files. (Yes, my CHAP is now
finally working with it, but i gotta update my kernel to the new
callout interface before being able to commit _that_.)

Reviewed by: peter [long ago, actually]


# bc6d9b80 07-Sep-1997 Joerg Wunsch <joerg@FreeBSD.org>

Fix a typo that becomes apparent when compiling without COMPAT_443.

Submitted by: Tony Kimball <Anthony.Kimball@East.Sun.COM>


# 4d1d4912 01-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Added used #include - don't depend on <sys/mbuf.h> including
<sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).


# 7ed8f465 27-Aug-1997 Julian Elischer <julian@FreeBSD.org>

Add a per-interface-address pointer to a function that can be supplied
by a protocol, to detirmine if an address matches the net this address
is part of. This is needed by protocols for which netmasks
"just don't work", for example appletalk.

Also add the code in appletalk to make use of this new feature.
Thsi fixes one of the longest standing bugs in appletalk.
The inability to talk to machines to which the path is via a router
which is on a different net, but the same netrange, as your interface.
Protocols that do not supply this function (e.g. IP) should not be affected.


# 7e2a6151 22-Aug-1997 Julian Elischer <julian@FreeBSD.org>

add some comments while trying to understand why appletalk
gets some things wrong.
(part of my continuing "comment it as you understand it" effort :)


# 57af7922 07-Jul-1997 Julian Elischer <julian@FreeBSD.org>

Don't add an item to the multicast linked list if it's already
on the list.


# a912e453 03-May-1997 Peter Wemm <peter@FreeBSD.org>

add SIOC{S,G}IFMEDIA ioctl support


# a29f300e 27-Apr-1997 Garrett Wollman <wollman@FreeBSD.org>

The long-awaited mega-massive-network-code- cleanup. Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.


# 51a53488 24-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Don't include <sys/ioctl.h> in the kernel. Stage 2: include
<sys/sockio.h> instead of <sys/ioctl.h> in network files.


# 4a26224c 14-Feb-1997 Garrett Wollman <wollman@FreeBSD.org>

Send RTM_IFINFO messages whenever promiscuous and all-multicast
modes are enabled or disabled.


# 176395b2 12-Feb-1997 Garrett Wollman <wollman@FreeBSD.org>

Implement PRC_IFUP a la PRC_IFDOWN so that protocols know when an interface
has come bacl up (and can referse actions taken as a result of downing).


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 477180fb 13-Jan-1997 Garrett Wollman <wollman@FreeBSD.org>

Use the new if_multiaddrs list for multicast addresses rather than the
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).


# b2053118 08-Jan-1997 Garrett Wollman <wollman@FreeBSD.org>

Fix typo. I hate waking up at 4:45 in the morning...


# 373f88ed 08-Jan-1997 Garrett Wollman <wollman@FreeBSD.org>

Fix a few oversights in the new multicast membership interface.


# 1158dfb7 07-Jan-1997 Garrett Wollman <wollman@FreeBSD.org>

Checkpoint the beginnings of the new kernel interface for
multicast group memberships. This is not actually operative
at the moment (a lot of other code still needs to be changed), but
this seemed like a useful reference point to check in so that
others (i.e. Bill Fenner) have fair warning of where we are going.


# 59562606 13-Dec-1996 Garrett Wollman <wollman@FreeBSD.org>

Convert the interface address and IP interface address structures
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).


# 29412182 11-Dec-1996 Garrett Wollman <wollman@FreeBSD.org>

Use queue macros for the list of interfaces. Next stop: ifaddrs!


# 381dd1d2 06-Aug-1996 Julian Elischer <julian@FreeBSD.org>

Submitted by: archie@whistle.com
This is a patch to sys/net/if.c. What it does is patch the algorithm
for finding an IP address on an interface which most closely matches
a given IP address. The problem with it is when no address matches,
and you have to just pick one at random. Then the code ends up picking
the last IP address in the list. This patch changes things so it
picks up the first address instead.
Usually the first address is more useful as the later ones are aliases.


# bbd17bf8 30-Jul-1996 Garrett Wollman <wollman@FreeBSD.org>

Add better support for retrieving management information from network
interfaces. This creates two new tables in the net.link.generic branch
of the MIB; one contains (essentially) `ifdata' structures, and the other
contains a blob provided by the interface (and presumably used to
implement link-layer-specific MIB variables). A number of things
have been moved around in the `ifnet' and `ifdata' structures, so
NEW VERSIONS OF ifconfig(8) AND routed(8) ARE REQUIRED. (A simple
recompile is all that's necessary.)

I have a sample program which uses this interface for those interested
in making use of it.


# fcd6781a 24-Jul-1996 Garrett Wollman <wollman@FreeBSD.org>

Fix a bug in ifa_ifwithnet() which caused a page fault in bcmp()
when attepmting to add certain types of routes. This problem
only manifested itself in the presence of unconfigured point-to-point
interfaces.

Noticed by: Chuck Cranor <chuck@maria.wustl.edu>


# 2c37256e 11-Jul-1996 Garrett Wollman <wollman@FreeBSD.org>

Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.


# a614bfe0 12-Jun-1996 Gary Palmer <gpalmer@FreeBSD.org>

Since the updates to ifnet.if_lastchange are so rare (relatively
speaking), go for the extra accuracy and call microtime() to get
the current time.

Pointed Out By: bde


# e39a0280 10-Jun-1996 Gary Palmer <gpalmer@FreeBSD.org>

Change the use if ifnet.if_lastchange to be more in line with
SNMP requirements. Update description of ifnet.if_lastchange in if.h
to indicate this.


# 58639ed3 05-Jun-1996 Garrett Wollman <wollman@FreeBSD.org>

Don't allow trailing garbage after the unit number in ifunit().


# 2ee45d7d 11-Mar-1996 David Greenman <dg@FreeBSD.org>

Move or add #include <queue.h> in preparation for upcoming struct socket
changes.


# a9ad85b0 08-Feb-1996 Garrett Wollman <wollman@FreeBSD.org>

If a slow input queue was defined by the driver, initialize it.


# 9b44ff22 06-Feb-1996 Garrett Wollman <wollman@FreeBSD.org>

Clean up Ethernet drivers:
- fill in and use ifp->if_softc
- use if_bpf rather than private cookie variables
- change bpf interface to take advantage of this
- call ether_ifattach() directly from Ethernet drivers
- delete kludge in if_attach() that did this indirectly


# 1ce9bf88 24-Jan-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Use new printf features rather than local kludges.


# 602d513c 20-Dec-1995 Garrett Wollman <wollman@FreeBSD.org>

in_proto.c: spell ``Internet'' right and put whitespace after commas.

others: start to populate the link-layer branch of the net mib, by
moving ARP to its proper place. (ARP is not a protocol family, it's an
interface layer between a medium-access layer and a protocol family.)
sysctl(8) needs to be taught about the structure of this branch, unless
Poul-Henning implements dynamic MIB exploration soon.


# 3bda9f9b 09-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize, clean lint.


# 4a5f1499 04-Dec-1995 David Greenman <dg@FreeBSD.org>

all:
Removed ifnet.if_init and ifnet.if_reset as they are generally unused.
Change the parameter passed to if_watchdog to be a ifnet * rather than
a unit number. All of this is an attempt to move toward not needing an
array of softc pointers (which is usually static in size) to point to
the driver softc.

if_ed.c:
Changed some of the argument passing to some functions to make a little
more sense.

if_ep.c, if_vx.c:
Killed completely bogus use of if_timer. It was being set in such a way
that the interface was being reset once per second (blech!).


# 27501cb6 18-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Added bogus casts to avoid warnings.

Continued cleaning up sysinit stuff.


# 9e52b982 03-Oct-1995 Garrett Wollman <wollman@FreeBSD.org>

Import of 4.4-Lite-2 sys/net to make merge and examination easier. Since we
are not on the vendor branch for any of these files, the conflicts shown make
no matter.

Obtained from: 4.4BSD-Lite-2


# e9a30d00 27-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Add newline at end of log message and reduce log level to INFO from NOTICE.


# 963e4c2a 22-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Fix BPf to generate a header mbuf for writes.
Fix loopback and discard interfaces to understand BPF writes.
(These two from Bill Fenner to fix PR 512.)

Move ifpromisc() from bpf.c to if.c as suggested by comment in BPF.
Send a notice to the log when promiscuous mode is enabled.


# 4590fd3a 09-Sep-1995 David Greenman <dg@FreeBSD.org>

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 523a02aa 27-Jun-1995 David Greenman <dg@FreeBSD.org>

Don't skip point-to-point interfaces if the netmask==0 (the netmask
should be completely ignored for point-to-point interfaces).
For point-to-point interfaces, route based on the destination address,
not the local address.

Submitted by: Peter Wemm


# 3740e2ad 14-Jun-1995 David Greenman <dg@FreeBSD.org>

Took out P2P_LOCALADDR_SHARE option and made it standard.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# b2af64fd 26-May-1995 David Greenman <dg@FreeBSD.org>

Added a fix for a bug which caused the wrong interface to be selected
for broadcasts if point-to-point links shared the same IP address as
the ethernet. The fix must be enabled with P2P_LOCALADDR_SHARE option
in the kernel config file. This will someday likely be standard, but
there isn't sufficient time before release to determine if there are
any interoperability problems with routed and/or gated.

Reviewed by: Garrett Wollman, and me
Submitted by: Peter Wemm


# 55088a1c 24-Feb-1995 David Greenman <dg@FreeBSD.org>

In ifa_ifwithdstaddr() when walking through ifa structs associated with
a point-to-point link, don't attempt a comparison if the pointer to the
destination sockaddr is NULL (i.e. it has not been set/initialized).


# 73c2ab46 29-Dec-1994 David Greenman <dg@FreeBSD.org>

Moved declaration of ifnet pointer out of the header file and into the
.c file where it belongs. Bezeroed some uninitialized malloc data.


# 074c4a4e 21-Dec-1994 Garrett Wollman <wollman@FreeBSD.org>

Add generic part of generic multiple-physical-interface support (the
successor of IFF_ALTPHYS).


# b0dd7a6b 10-Nov-1994 Guido van Rooij <guido@FreeBSD.org>

Remove redundant stuff. Amazing that they actually solved a bug found in
1.1.5.1, and oversaw this thang.


# 9448326f 07-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Mostly Cosmetics. Some of the procedures in if_sl.c was void, but should
be int. I made them int, and let them return 0. Will have to find out
what the return-val is used for.


# 2624cf89 05-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

A number of bug-fixes inspired by Mark Treacy:
- Allow PPP to run multicasts natively.
- Deal properly with lots of similarly-named interfaces.
- Don't sign-extend if_flags.

NB: the last fix (to rtsock.c) must be reversed when we expand if_flags to a
reasonable size.

Submitted by: Mark Treacy


# fe95e21f 15-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Made the kernel compile even without "ether".


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 66025569 08-Aug-1994 David Greenman <dg@FreeBSD.org>

On second thought, better restrict the mtu to between 72-65535...strange
things happen otherwise.


# 75ee03cb 08-Aug-1994 David Greenman <dg@FreeBSD.org>

Enforce the mtu to between the range 1-65535 before calling the driver
ioctl routine.


# a7028af7 08-Aug-1994 David Greenman <dg@FreeBSD.org>

Added ioctl support for SIOCGIFMTU and SIOCSIFMTU. These set the per-
interface MTU.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources