History log of /freebsd-current/sys/netinet/in.c
Revision Date Author Comments
# 56f78600 19-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

carp: check CARP status in in_localip_fib(), in6_localip_fib()

Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.

This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.

PR: 277349


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 215bab79 25-Jul-2023 Shivank Garg <shivank@freebsd.org>

mac_ipacl: new MAC policy module to limit jail/vnet IP configuration

The mac_ipacl policy module enables fine-grained control over IP address
configuration within VNET jails from the base system.
It allows the root user to define rules governing IP addresses for
jails and their interfaces using the sysctl interface.

Requested by: multiple
Sponsored by: Google, Inc. (GSoC 2019)
MFC after: 2 months
Reviewed by: bz, dch (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D20967


# bb06a80c 29-Jun-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

netinet[6]: make in[6]_control use ucred instead of td.

Reviewed by: markj, zlei
Differential Revision: https://reviews.freebsd.org/D40793
MFC after: 2 weeks


# ca185047 25-Apr-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

lltable: properly set expire time to 0 for static IPv4 entries.

MFC after: 2 weeks


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# 1e9482f4 08-Oct-2022 Alexander Motin <mav@FreeBSD.org>

inet: Simplify if_multiaddrs iteration.

Similar to 2cd6ad766eb23 for inet6 drop ifma_restart use, creating more
problems than solving. It is no longer needed after epoch introduction.

While there, add NULL check for ifma_ifp in igmp_change_state(), that
sometimes caused panics on interface destruction.

MFC after: 2 weeks


# f375bf0e 25-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

netinet: pass cred instead of the curthread to ifaddr manipulation funcs.

Pass the credentials directly to the functions, so non-ioctl kernel
users can also performan address manipulations.

MFC after: 2 weeks


# 7b3440fc 29-Aug-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

Revert "routing: install prefix and loopback routes using new nhop-based KPI."

Temporarily revert the commit to unblock testing.

This reverts commit a1b59379db7d879551118b921f6e9692b4bf200c.


# a1b59379 08-Aug-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

routing: install prefix and loopback routes using new nhop-based KPI.

Construct the desired hexthops directly instead of using the
"translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
rib_copy_route() to propagate the routes to the non-primary address
fibs.

MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36166


# f277746e 12-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: change prototype for pr_control

For some reason protosw.h is used during world complation and userland
is not aware of caddr_t, a relic from the first version of C. Broken
buildworld is good reason to get rid of yet another caddr_t in kernel.

Fixes: 886fc1e80490fb03e72e306774766cbb2c733ac6


# b8103ca7 11-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet: get interface event notifications directly via EVENTHANDLER(9)

The old mechanism of getting them via domains/protocols control input
is a relict from the previous century, when nothing like EVENTHANDLER(9)
existed yet. Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one
to use them.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36116


# fb8ef16b 21-Jul-2022 Mike Karels <karels@FreeBSD.org>

IPv4: correct limit on loopback_prefix

Commit efe58855f3ea allowed the net.inet.ip.loopback_prefix value
to be 32. However, with a 32-bit mask, 127.0.0.1 is not included
in the reserved loopback range, which should not be allowed.
Change the max prefix length to 31.


# efe58855 24-May-2022 Mike Karels <karels@FreeBSD.org>

IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8

Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16). All changes are disabled by default, and can be
enabled by the following sysctls:

net.inet.ip.allow_net0=1
net.inet.ip.allow_net240=1
net.inet.ip.loopback_prefixlen=16

When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.

Add descriptions of the new sysctls to inet.4.

Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.

The proposals motivating this experimentation can be found in

https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0
https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240
https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127

Reviewed by: rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741


# 77001f9b 30-May-2022 KUROSAWA Takahiro <takahiro.kurosawa@gmail.com>

lltable: introduce the llt_post_resolved callback

In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after: 2 weeks


# 990a6d18 08-Apr-2022 Mark Johnston <markj@FreeBSD.org>

net: Fix memory leaks in lltable_calc_llheader() error paths

Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended. There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.

Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.

Reviewed by: bz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34832


# ff3a85d3 25-Dec-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

[lltable] Add per-family lltable getters.

Introduce a new function, lltable_get(), to retrieve lltable pointer
for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
ARP/ND records.

Differential Revision: https://reviews.freebsd.org/D33660
MFC after: 2 weeks


# eb8dcdea 26-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

jail: network epoch protection for IP address lists

Now struct prison has two pointers (IPv4 and IPv6) of struct
prison_ip type. Each points into epoch context, address count
and variable size array of addresses. These structures are
freed with network epoch deferred free and are not edited in
place, instead a new structure is allocated and set.

While here, the change also generalizes a lot (but not enough)
of IPv4 and IPv6 processing. E.g. address family agnostic helpers
for kern_jail_set() are provided, that reduce v4-v6 copy-paste.

The fast-path prison_check_ip[46]_locked() is also generalized
into prison_ip_check() that can be executed with network epoch
protection only.

Reviewed by: jamie
Differential revision: https://reviews.freebsd.org/D33339


# 2f35e7d9 10-Nov-2021 Mike Karels <karels@FreeBSD.org>

kernel: partially revert e9efb1125a15, default inet mask

When no mask is supplied to the ioctl adding an Internet interface
address, revert to using the historical class mask rather than a
single default. Similarly for the NFS bootp code.

MFC after: 3 weeks
Reviewed by: melifaro glebius
Differential Revision: https://reviews.freebsd.org/D32951


# 9c89392f 12-Nov-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Add in_localip_fib(), in6_localip_fib().

Check if given address/FIB exists locally.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D32913


# 20d59403 26-Oct-2021 Mike Karels <karels@FreeBSD.org>

kernel: deprecate Internet Class A/B/C

Hide historical Class A/B/C macros unless IN_HISTORICAL_NETS is defined;
define it for user level. Define IN_MULTICAST separately from IN_CLASSD,
and use it in pf instead of IN_CLASSD. Stop using class for setting
default masks when not specified; instead, define new default mask
(24 bits). Warn when an Internet address is set without a mask.

MFC after: 1 month
Reviewed by: cy
Differential Revision: https://reviews.freebsd.org/D32708


# c8ee75f2 10-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Use network epoch to protect local IPv4 addresses hash.

The modification to the hash are already naturally locked by
in_control_sx. Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.

Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D32584


# 2144431c 08-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Remove in_ifaddr_lock acquisiton to access in_ifaddrhead.

An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.

Next step would be to CK-ify the in_addr hash and get rid of the...

Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D32434


# fd076593 05-Sep-2021 Mike Karels <karels@FreeBSD.org>

Change lowest address on subnet (host 0) not to broadcast by default.

The address with a host part of all zeros was used as a broadcast long
ago, but the default has been all ones since 4.3BSD and RFC1122. Until
now, we would broadcast the host zero address as well as the configured
address. Change to not broadcasting that address by default, but add a
sysctl (net.inet.ip.broadcast_lowest) to re-enable it. Note that the
correct way to use the zero address for broadcast would be to configure
it as the broadcast address for the network.

See https:/datatracker.ietf.org/doc/draft-schoen-intarea-lowest-address/
and the discussion in https://reviews.freebsd.org/D19316. Note, Linux
now implements this.

Reviewed by: rgrimes, tuexen; melifaro (previous version)
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D31861


# 4b631fc8 06-Sep-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

routing: fix source address selection rules for IPv4 over IPv6.

Current logic always selects an IFA of the same family from the
outgoing interfaces. In IPv4 over IPv6 setup there can be just
single non-127.0.0.1 ifa, attached to the loopback interface.

Create a separate rt_getifa_family() to handle entire ifa selection
for the IPv4 over IPv6.

Differential Revision: https://reviews.freebsd.org/D31868
MFC after: 1 week


# 936f4a42 03-Sep-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

lltable: do not require prefix lookup when checking lle allocation rules.

With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.

To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:

1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.

Change the code to perform the following:

1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.

Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965


# 620cf65c 24-Aug-2021 Artem Khramov <akhramov@pm.me>

netinet: prevent NULL pointer dereference in in_aifaddr_ioctl()

It appears that maliciously crafted ifaliasreq can lead to NULL
pointer dereference in in_aifaddr_ioctl(). In order to replicate
that, one needs to

1. Ensure that carp(4) is not loaded

2. Issue SIOCAIFADDR call setting ifra_vhid field of the request
to a negative value.

A repro code would look like this.

int main() {
struct ifaliasreq req;
struct sockaddr_in sin, mask;
int fd, error;

bzero(&sin, sizeof(struct sockaddr_in));
bzero(&mask, sizeof(struct sockaddr_in));

sin.sin_len = sizeof(struct sockaddr_in);
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = inet_addr("192.168.88.2");

mask.sin_len = sizeof(struct sockaddr_in);
mask.sin_family = AF_INET;
mask.sin_addr.s_addr = inet_addr("255.255.255.0");

fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0)
return (-1);

memset(&req, 0, sizeof(struct ifaliasreq));
strlcpy(req.ifra_name, "lo0", sizeof(req.ifra_name));
memcpy(&req.ifra_addr, &sin, sin.sin_len);
memcpy(&req.ifra_mask, &mask, mask.sin_len);
req.ifra_vhid = -1;

return ioctl(fd, SIOCAIFADDR, (char *)&req);
}

To fix, discard both positive and negative vhid values in
in_aifaddr_ioctl, if carp(4) is not loaded. This prevents NULL pointer
dereference and kernel panic.

Reviewed by: imp@
Pull Request: https://github.com/freebsd/freebsd-src/pull/530


# f3a3b061 02-Aug-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

[lltable] Unify datapath feedback mechamism.

Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31390


# 8e8f1cc9 23-Apr-2021 Mark Johnston <markj@FreeBSD.org>

Re-enable network ioctls in capability mode

This reverts a portion of 274579831b61 ("capsicum: Limit socket
operations in capability mode") as at least rtsol and dhcpcd rely on
being able to configure network interfaces while in capability mode.

Reported by: bapt, Greg V
Sponsored by: The FreeBSD Foundation


# 27457983 07-Apr-2021 Mark Johnston <markj@FreeBSD.org>

capsicum: Limit socket operations in capability mode

Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.

Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.

Reviewed by: oshogbo
Discussed with: emaste
Reported by: manu
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29423


# 9fdbf7ee 16-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make in_localip_more() fib-aware.

It fixes loopback route installation for the interfaces
in the different fibs using the same prefix.

Reviewed By: donner
PR: 189088
Differential Revision: https://reviews.freebsd.org/D28673
MFC after: 1 week


# 130aebba 19-Jan-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Further refactor IPv4 interface route creation.

* Fix bug with /32 aliases introduced in 81728a538d24.
* Explicitly document business logic for IPv4 ifa routes.
* Remove remnants of rtinit()
* Deduplicate ifa->route prefix code by moving it into ia_getrtprefix()
* Deduplicate conditional check for ifa_maintain_loopback_route() by
moving into ia_need_loopback_route()
* Remove now-unused flags argument from in_addprefix().

Reviewed by: donner
PR: 252883
Differential Revision: https://reviews.freebsd.org/D28246


# 81728a53 08-Jan-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Split rtinit() into multiple functions.

rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.

1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.

2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.

3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.

4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.

5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.

To address all these points, the following has been done:

* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.

Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses

Differential revision: https://reviews.freebsd.org/D28186


# d68cf57b 07-Jan-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Refactor rt_addrmsg() and rt_routemsg().

Summary:
* Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally.
* Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer.
* Refactor rtsock_routemsg(): avoid accessing rtentry fields directly.
* Simplify in_addprefix() by moving prefix search to a separate function.

Reviewers: #network

Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D28011


# 6952c3e1 14-Oct-2020 Andrey V. Elsukov <ae@FreeBSD.org>

Implement SIOCGIFALIAS.

It is lightweight way to check if an IPv4 address exists.

Submitted by: Roy Marples
Reviewed by: gnn, melifaro
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26636


# 3f740d43 13-Oct-2020 Andrey V. Elsukov <ae@FreeBSD.org>

Join to AllHosts multicast group again when adding an existing IPv4 address.

When SIOCAIFADDR ioctl configures an IPv4 address that is already exist,
it removes old ifaddr. When this IPv4 address is only one configured on
the interface, this also leads to leaving from AllHosts multicast group.
Then an address is added again, but due to the bug, this doesn't lead
to joining to AllHosts multicast group.

Submitted by: yannis.planus_alstomgroup.com
Reviewed by: gnn
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D26757


# fedeb08b 03-Oct-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Introduce scalable route multipath.

This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 3689652c 10-Aug-2020 Hans Petter Selasky <hselasky@FreeBSD.org>

Make sure the multicast release tasks are properly drained when
destroying a VNET or a network interface.

Else the inm release tasks, both IPv4 and IPv6 may cause a panic
accessing a freed VNET or network interface.

Reviewed by: jmg@
Discussed with: bz@
Differential Revision: https://reviews.freebsd.org/D24914
MFC after: 1 week
Sponsored by: Mellanox Technologies


# 481be5de 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


# 2a4bd982 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch. The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC


# b8a6e03f 07-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111


# 6c1c6ae5 04-Apr-2019 Rodney W. Grimes <rgrimes@FreeBSD.org>

Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code

There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros. Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by: karels, kristof
Approved by: bde (mentor)
MFC after: 3 weeks
Sponsored by: John Gilmore
Differential Revision: https://reviews.freebsd.org/D19317


# 49cf58e5 23-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Style.

Reviewed by: bz
MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# c06cc56e 23-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Fix an LLE lookup race.

After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE. The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer. By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.

Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock. If it's not set, the lookup fails.

PR: 234296
Reviewed by: bz
Tested by: sbruno, Victor <chernov_victor@list.ru>,
Mike Andrews <mandrews@bit0.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18906


# a68cc388 08-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin


# 64d63b1e 21-Oct-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17100


# 59b2022f 15-Aug-2018 Luiz Otavio O Souza <loos@FreeBSD.org>

Late style follow up on r312770.

Submitted by: glebius
X-MFC with: r312770
MFC after: 3 days


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# acf673ed 17-Jul-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Move invoking of callout_stop(&lle->lle_timer) into llentry_free().

This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR: 209682, 225927
Submitted by: hselasky (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4605


# 4f6c66cc 23-May-2018 Matt Macy <mmacy@FreeBSD.org>

UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15409


# f6960e20 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

netinet silence warnings


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# 514ef08c 15-May-2018 Brooks Davis <brooks@FreeBSD.org>

Unwrap a line that no longer requires wrapping.


# 423349f6 15-May-2018 Brooks Davis <brooks@FreeBSD.org>

Remove stray tabs from in_lltable_dump_entry().


# b6f6f880 06-May-2018 Matt Macy <mmacy@FreeBSD.org>

r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by: lwhsu
Approved by: sbruno


# f3e1324b 02-May-2018 Stephen Hurd <shurd@FreeBSD.org>

Separate list manipulation locking from state change in multicast

Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by: mmacy <mmacy@mattmacy.io>
Reviewed by: shurd, sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14969


# 1435dcd9 17-Mar-2018 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.

Current arp/nd code relies on the feedback from the datapath indicating
that the entry is still used. This mechanism is incorporated into the
arpresolve()/nd6_resolve() routines. After the inpcb route cache
introduction, the packet path for the locally-originated packets changed,
passing cached lle pointer to the ether_output() directly. This resulted
in the arp/ndp entry expire each time exactly after the configured max_age
interval. During the small window between the ARP/NDP request and reply
from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
path used by route cache. Unify the notification code by using single
inlined function with the per-AF callbacks.

Reported by: sthaug at nethelp.no
Reviewed by: ae
MFC after: 2 weeks


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 3e85b721 16-May-2017 Ed Maste <emaste@FreeBSD.org>

Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 8144690a 16-Feb-2017 Eric van Gyzen <vangyzen@FreeBSD.org>

Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel

inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by: glebius, emaste
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9625


# 338e227a 25-Jan-2017 Luiz Otavio O Souza <loos@FreeBSD.org>

After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by: glebius
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)


# 2d9db0bc 01-Oct-2016 Eric van Gyzen <vangyzen@FreeBSD.org>

Add GARP retransmit capability

A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.

To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).

Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.

Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695


# 11f2a7cd 18-Aug-2016 Ryan Stone <rstone@FreeBSD.org>

Fix unlocked access to ifnet address list

in_broadcast() was iterating over the ifnet address list without
first taking an IF_ADDR_RLOCK. This could cause a panic if a
concurrent operation modified the list.

Reviewed by: bz
MFC after: 2 months
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7227


# 90cc51a1 18-Aug-2016 Ryan Stone <rstone@FreeBSD.org>

Don't iterate over the ifnet addr list in ip_output()

For almost every packet that is transmitted through ip_output(),
a call to in_broadcast() was made to decide if the destination
IP was a broadcast address. in_broadcast() iterates over the
ifnet's address to find a source IP matching the subnet of the
destination IP, and then checks if the IP is a broadcast in that
subnet.

This is completely redundant as we have already performed the
route lookup, so the source IP is already known. Just use that
address to directly check whether the destination IP is a
broadcast address or not.

MFC after: 2 months
Sponsored By: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7266


# 89856f7e 21-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747


# 2769d062 26-Apr-2016 Conrad Meyer <cem@FreeBSD.org>

in_lltable_alloc and in6 copy: Don't leak LLE in error path

Fix a memory leak in error conditions introduced in r292978.

Reported by: Coverity
CIDs: 1347009, 1347010
Sponsored by: EMC / Isilon Storage Division


# 9a1b64d5 04-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
just next-hop data (e.g. individual rtentry fields like flags or
prefix/mask). Given that the goal is to avoid rte reference/refcounting,
re-use rt_addrinfo structure to store most rte fields. If caller wants
to retrieve key/mask or gateway (which are sockaddrs and are allocated
separately), it needs to provide sufficient-sized sockaddrs structures
w/ ther pointers saved in passed rt_addrinfo.

Convert:
* lltable new records checks (in_lltable_rtcheck(),
nd6_is_new_addr_neighbor().
* rtsock pre-add/change route check.
* IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
not be multiple host routes for such hosts 2) if we have multiple
routes we should inspect them (which is not done). 3) the entire idea
of abusing KRT as storage for ND proxy seems odd. Userland programs
should be used for that purpose).


# 4fb3a820 30-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102


# f8aee88f 05-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove LLE read lock from IPv4 fast path.

LLE structure is mostly unchanged during its lifecycle.
To be more specific, there are 2 things relevant for fast path
lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we re-send arp request to perform reachability verification instead of
expiring entry. The only signal that is needed from fast path is something
like binary yes/no.

The latter is solved by the following changes:
1) introduce special r_skip_req field which is read lockless by fast path,
but updated under (new) req_mutex mutex. If this field is non-zero, then
fast path will acquire lock and set it back to 0.
2) introduce simple state machine: incomplete->reachable<->verify->deleted.
Before that we implicitely had incomplete->reachable->deleted state machine,
with V_arpt_keep between "reachable" and "deleted". Verification was performed
in runtime 5 seconds before V_arpt_keep expire.
This is changed to "change state to verify 5 seconds before V_arpt_keep,
set r_skip_req to non-zero value and check it every second". If the value
is zero - then send arp verification probe.
These changes do not introduce any signifficant control plane overhead:
typically lle callout timer would fire 1 time more each V_arpt_keep (1200s)
for used lles and up to arp_maxtries (5) for dead lles.

As a result, all packets towards "reachable" lle are handled by fast path without
acquiring lle read lock.

Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler
might keep LLE lock for signifficant amount of time, which might not be feasible
for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock).

Differential Revision: https://reviews.freebsd.org/D3688


# 7c4676dd 13-Nov-2015 Randall Stewart <rrs@FreeBSD.org>

This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after: 1 Month with associated callout change.


# ddd208f7 07-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Unify setting lladdr for AF_INET[6].


# 26a60575 17-Oct-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix deletion of ifaddr lle entries when deleting prefix from interface in
down state.

Regression appeared in r287789, where the "prefix has no corresponding
installed route" case was forgotten. Additionally, lltable_delete_addr()
was called with incorrect byte order (default is network for lltable code).
While here, improve comments on given cases and byte order.

PR: 203573
Submitted by: phk


# 4a336ef4 26-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

rtsock requests for deleting interface address lles started to return EPERM
instead of old "ignore-and-return 0" in r287789. This broke arp -da /
ndp -cn behavior (they exit on rtsock command failure). Fix this by
translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and
making arp/ndp ignore these entries in batched delete.

MFC after: 2 weeks


# 59c180c3 16-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().


# 3e7a2321 14-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3573


# 5a255516 19-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
return existing if found, create if not. This behaviour was error-prone
since we had to deal with 'sudden' static<>dynamic lle changes.
This commit fixes bunch of different issues like:
- refcount leak when lle is converted to static.
Simple check case:
console 1:
while true;
do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
done;
done
console 2:
ping -f any-dead-host-in-L2
console 3:
# watch for memory consumption:
vmstat -m | awk '$1~/lltable/{print$2}'
- possible problems in arptimer() / nd6_timer() when dropping/reacquiring
lock.
New logic explicitly handles use-or-create cases in every lla_create
user. Basically, most of the changes are purely mechanical. However,
we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
lle_free_t callback for entry deletion


# 0447c136 10-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.


# 314294de 11-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by: Yandex LLC


# 11cdad98 09-Aug-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
format (code is not indented properly to minimize diff). Will be fixed
in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by: ae


# 6e4cd746 08-Aug-2015 Marius Strobl <marius@FreeBSD.org>

Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.


# cc0a3c8c 29-Jul-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.

Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149


# 28ebe80c 17-Apr-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Provide functions to determine presence of a given address
configured on a given interface.

Discussed with: np
Sponsored by: Nginx, Inc.


# 2575fbb8 09-Feb-2015 Randall Stewart <rrs@FreeBSD.org>

This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by: adrian, jhb and bz
Sponsored by: Netflix Inc.


# 3a749863 05-Jan-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Allocate hash tables separately
* Make llt_hash() callback more flexible
* Default hash size and hashing method is now per-af
* Move lltable allocation to separate function


# b44a7d5d 03-Jan-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Use unified code for deleting entry by sockaddr instead of per-af one.
* Remove now unused llt_delete_addr callback.


# 20dd8995 03-Jan-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Hide lltable implementation details in if_llatbl_var.h
* Make most of lltable_* methods 'normal' functions instead of inline
* Add lltable_get_<af|ifp>() functions to access given lltable fields
* Temporarily resurrect nd6_lookup() function


# 73db4e00 03-Jan-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Finish r275628: remove remaining 'base' references.


# ee7e9a4e 08-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Do not assume lle has sockaddr key after struct lle:
use llt_fill_sa_entry() llt method to store lle address in sa.
* Eliminate L3_ADDR macro and either reference IPv4/IPv6 address
directly from lle or use newly-created llt_fill_sa_entry().
* Do not store sockaddr inside arp/ndp lle anymore.


# d82ed505 08-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify lle lookup/create api by using addresses instead of sockaddrs.


# 73b52ad8 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use llt_prepare_static_entry method to prepare valid per-af static entry.


# 0368226e 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Retire abstract llentry_free() in favor of lltable_drop_entry_queue()
and explicit calls to RTENTRY_FREE_LOCKED()
* Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6.
* Rename <lltable_|llt>_delete function to _delete_addr() to note that
this function is used to external callers. Make this function maintain
its own locking.
* Use lookup/unlink/clear call chain from internal callers instead of
delete_addr.
* Fix LLE_DELETED flag handling


# 721cd2e0 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do not enforce particular lle storage scheme:
* move lltable allocation to per-domain callbacks.
* make llentry_link/unlink functions overridable llt methods.
* make hash table traversal another overridable llt method.


# a743ccd4 07-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Add llt_clear_entry() callback which is able to do all lle
cleanup including unlinking/freeing
* Relax locking in lltable_prefix_free_af/lltable_free
* Do not pass @llt to lle free callback: it is always NULL now.
* Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding
unlock/lock sequinces
* Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc()
to retrieve preferred source address from lle hold queue and pass it
instead of lle.
* Finally, make nd6_create() create and return unlocked lle
* Separate defrtr handling code from nd6_free():
use nd6_check_del_defrtr() to check if we need to keep entry instead of
performing GC,
use nd6_check_recalc_defrtr() to perform actual recalc on lle removal.
* Move isRouter handling from nd6_cache_lladdr() to separate
nd6_check_router()
* Add initial code to maintain lle runtime flags in sync.


# 9b65db85 01-Dec-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do more fine-grained locking in lltable code: lltable_create_lle()
does actual new lle creation without extensive locking and existing
lle search.
Move lle updating code from gigantic in_arpinput() to arp_update_llle()
and some other functions.

IPv6 changes to follow.


# ce313fdd 30-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Unify lle table dump/prefix removal code.
* Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes
used by lltable code.


# 4c9df1c5 23-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix r274855: use proper unlock method.


# 73d77028 23-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Do more fine-grained lltable locking: use table runtime lock as rare
as we can.


# 9479029b 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Add lltable llt_hash callback
* Move lltable items insertions/deletions to generic llt code.


# 7c066c18 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
(IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.


# 27688dfe 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Temporarily revert r274774.


# 8f465f66 22-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert &in_ifaddr_lock to dual-locking model:
use rwlock accessible via external functions
(IN_IFADDR_CFG_* -> in_ifaddr_cfg_*()) for all control plane tasks
use rmlock (IN_IFADDR_RUN_*) for fast-path lookups.


# 86b94cff 21-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Finish r274774: add more headers/fix build for non-debug case.


# 9883e41b 20-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Switch IF_AFDATA lock to rmlock


# f9723c77 20-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp
we need to return.
Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags
fields for all relevant functions.


# df629abf 16-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Rework LLE code locking:
* struct llentry is now basically split into 2 pieces:
all fields within 64 bytes (amd64) are now protected by both
ifdata lock AND lle lock, e.g. you require both locks to be held
exclusively for modification. All data necessary for fast path
operations is kept here. Some fields were added:
- r_l3addr - makes lookup key liev within first 64 bytes.
- r_flags - flags, containing pre-compiled decision whether given
lle contains usable data or not. Current the only flag is RLLE_VALID.
- r_len - prepend data len, currently unused
- r_kick - used to provide feedback to control plane (see below).
All other fields are protected by lle lock.
* Add simple state machine for ARP to handle "about to expire" case:
Current model (for the fast path) is the following:
- rlock afdata
- find / rlock rte
- runlock afdata
- see if "expire time" is approaching
(time_uptime + la->la_preempt > la->la_expire)
- if true, call arprequest() and decrease la_preempt
- store MAC and runlock rte
New model (data plane):
- rlock afdata
- find rte
- check if it can be used using r_* fields only
- if true, store MAC
- if r_kick field != 0 set it to 0.
- runlock afdata
New mode (control plane):
- schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries)
seconds instead of V_arpt_keep.
- on first timer invocation change state from ARP_LLINFO_REACHABLE
to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in
V_arpt_rexmit (default to 1 sec).
- on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks
for r_kick value: reschedule if not changed, and send arprequest()
if set to zero (e.g. entry was used).
* Convert IPv4 path to use new single-lock approach. IPv6 bits to follow.
* Slow down in_arpinput(): now valid reply will (in most cases) require
acquiring afdata WLOCK twice. This is requirement for storing changed
lle data. This change will be slightly optimized in future.
* Provide explicit hash link/unlink functions for both ipv4/ipv6 code.
This will probably be moved to generic lle code once we have per-AF
hashing callback inside lltable.
* Perform lle unlink on deletion immediately instead of delaying it to
the timer routine.
* Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the
presence of lle reference used for safe callout calls.


# b4b1367a 15-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().


# 6df8a710 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


# 064b1bdb 06-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert lle rtchecks to use new routing API.
For inet/ case, this involves reverting r225947
which seem to be pretty strange commit and should
be reverted in HEAD ad well.


# 8c3cfe0b 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.


# cc45ae40 20-Sep-2014 Hiroki Sato <hrs@FreeBSD.org>

Add a change missing in r271916.


# a7f77a39 22-Aug-2014 Xin LI <delphij@FreeBSD.org>

Restore historical behavior of in_control, which, when no matching address
is found, the first usable address is returned for legacy ioctls like
SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR.

While there also fix a subtle issue that a caller from a jail asking for
INADDR_ANY may get the first IP of the host that do not belong to the jail.

Submitted by: glebius
Differential Revision: https://reviews.freebsd.org/D667


# 5af464bb 31-Jul-2014 Steven Hartland <smh@FreeBSD.org>

Ensure that IP's added to CARP always use the CARP MAC

Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.

This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.


# d34165f7 31-Jul-2014 Steven Hartland <smh@FreeBSD.org>

Only check error if one could have been generated


# 9753faf5 29-Jul-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric


# 7278b62a 29-Apr-2014 Alan Somers <asomers@FreeBSD.org>

Fix a panic when removing an IP address from an interface, if the same address
exists on another interface. The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution
is to use the interface fib in that call. For the majority of users, that will
be equivalent to the legacy behavior.

PR: kern/189089
Reported by: neel
Reviewed by: neel
MFC after: 3 weeks
X-MFC with: 264887
Sponsored by: Spectra Logic


# 0cfee0c2 24-Apr-2014 Alan Somers <asomers@FreeBSD.org>

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.

sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.

sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.

Clear expected failure messages for kern/187550 and kern/187552.

PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic


# 0489b891 24-Apr-2014 Alan Somers <asomers@FreeBSD.org>

Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure

sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.

PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation


# e06e816f 06-Apr-2014 Kevin Lo <kevlo@FreeBSD.org>

Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
Tested with vlc and a test suite [1].

[1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

Reviewed by: jhb, glebius, adrian


# 743c072a 26-Mar-2014 Alan Somers <asomers@FreeBSD.org>

Correct ARP update handling when the routes for network interfaces are
restricted to a single FIB in a multifib system.

Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.

sys/netinet/in.c:
When dealing with RTM_ADD (add route) requests for an interface, use
the interface's assigned FIB instead of the default (FIB 0).

sys/netinet/if_ether.c:
In arpresolve(), enhance error message generated when an
lla_lookup() fails so that the interface causing the error is
visible in logs.

tests/sys/netinet/fibs_test.sh
Clear ATF expected error.

PR: kern/167947
Submitted by: Nikolay Denev <ndenev@gmail.com> (previous version)
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation


# a49b317c 15-Jan-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix refcount leak on netinet ifa.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Yandex LLC


# d375edc9 09-Jan-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify inet alias handling code: if we're adding/removing alias which
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

This eliminates the following rtsock messages:

Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).

Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

before commit, addition:

got message of size 116 on Fri Jan 10 14:13:15 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

got message of size 192 on Fri Jan 10 14:13:15 2014
RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff

after commit, addition:

got message of size 116 on Fri Jan 10 13:56:26 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

before commit, wihdrawal:

got message of size 192 on Fri Jan 10 13:58:59 2014
RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff

got message of size 116 on Fri Jan 10 13:58:59 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

adter commit, withdrawal:

got message of size 116 on Fri Jan 10 14:14:11 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).

I've tested this change with quagga (no change) and bird (*).

bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.

I'm going to MFC this change if there will be no complains about behavior
change.

While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).

Sponsored by: Yandex LLC
MFC after: 4 weeks


# e2d14d93 02-Jan-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.

MFC after: 1 week


# 9706c950 29-Dec-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Fix couple of bugs from r257692 related to scan of address list on
an interface:
- in in_control() skip over not AF_INET addresses.
- in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check
of address family, w/o accessing memory beyond struct ifaddr.

Sponsored by: Nginx, Inc.


# c1f7c3f5 17-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

In r257692 I intentionally deleted code that handled P2P interfaces
with equal addresses on both sides. It appeared that OpenVPN uses
such configutations.

Submitted by: trociny


# 555036b5 10-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.


# 77b89ad8 06-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Provide compat layer for OSIOCAIFADDR.


# 821b5caf 06-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Fix my braino in r257692. For SIOCG*ADDR we don't need exact match on
specified address, actually in most cases the address isn't specified.

Reported by: peter


# 6224cd89 05-Nov-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Fix build on GCC.


# f7a39160 05-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Rewrite in_control(), so that it is comprehendable without getting mad.

o Provide separate functions for SIOCAIFADDR and for SIOCDIFADDR, with
clear code flow from beginning to the end. After that the rest of
in_control() gets very small and clear.
o Provide sx(9) lock to protect against parallel ioctl() invocations.
o Reimplement logic from r201282, that tried to keep localhost route in
table when multiple P2P interfaces with same local address are created
and deleted.

Discussed with: pluknet, melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# b1b9dcae 05-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.


# 237bf7f7 01-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Cleanup in_ifscrub(), which is just an entry to in_scrubprefix().


# c3322cb9 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 46758960 15-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 5b7cb97c 09-Jul-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.


# 1571132f 21-Apr-2013 Oleg Bulyzhin <oleg@FreeBSD.org>

Plug static llentry leak (ipv4 & ipv6 were affected).

PR: kern/172985
MFC after: 1 month


# 9711a168 31-Jan-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Retire struct sockaddr_inarp.

Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by: ru, andre, net@


# 8a1163e8 03-Jan-2013 Peter Wemm <peter@FreeBSD.org>

Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.


# 468e45f3 25-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.


# 3e6c8b53 24-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.


# 7db496de 19-Aug-2012 Randall Stewart <rrs@FreeBSD.org>

Though I disagree, I conceed to jhb & Rui. Note
that we still have a problem with this whole structure of
locks and in_input.c [it does not lock which it should not, but
this *can* lead to crashes]. (I have seen it in our SQA
testbed.. besides the one with a refcnt issue that I will
have SQA work on next week ;-)


# 94248791 16-Aug-2012 Randall Stewart <rrs@FreeBSD.org>

Ok jhb, lets move the ifa_free() down to the bottom to
assure that *all* tables and such are removed before
we start to free. This won't protect the Hash in ip_input.c
but in theory should protect any other uses that *do* use locks.

MFC after: 1 week (or more)


# 18474982 16-Aug-2012 Randall Stewart <rrs@FreeBSD.org>

Its never a good idea to double free the same
address.

MFC after: 1 week (after the other commits ahead of this gets MFC'd)


# ea537929 02-Aug-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>


# b9aee262 01-Aug-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Some more whitespace cleanup.


# ea50c13e 31-Jul-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Some style(9) and whitespace changes.

Together with: Andrey Zonov <andrey zonov.org>


# 09fe6320 19-Jun-2012 Navdeep Parhar <np@FreeBSD.org>

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


# 90b357f6 10-Apr-2012 Gleb Smirnoff <glebius@FreeBSD.org>

M_DONTWAIT is a flag from historical mbuf(9)
allocator, not malloc(9) or uma(9) flag.


# a93cda78 23-Feb-2012 Kip Macy <kmacy@FreeBSD.org>

When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after: 2 weeks


# 81d5d46b 03-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
are locked to the default FIB. Individual IPv6 addresses will only
appear in the default FIB, however redirect information and prefixes
of connected subnets are automatically propagated to all FIBs by
default (mimicking IPv4 behavior as closely as possible).

Sponsored by: Cisco Systems, Inc.


# 56cf9dc1 16-Jan-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Drop support for SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFBRDADDR, SIOCSIFDSTADDR
ioctl commands.

PR: 163524
Reviewed by: net


# 137f91e8 05-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks


# 0f188ebb 04-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Use a helper variable to wrap a long line.


# 56d6e129 04-Jan-2012 John Baldwin <jhb@FreeBSD.org>

In the handling of the SIOC[DG]LIFADDR icotls in in_lifaddr_ioctl(), add
missing interface address list locking and grab a reference on the
matching interface address after dropping the lock while it is used to
avoid a potential use after free.

Reviewed by: bz
MFC after: 1 week


# 0823c29b 04-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Fix the SIOC[DG]LIFADDR ioctls in in_lifaddr_ioctl() to work with IPv4
interface address rather than IPv6.

Submitted by: hrs
Reviewed by: bz
MFC after: 1 week


# 71212473 20-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by: brooks


# 92ed4e1a 16-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Since size of struct in_aliasreq has just been changed in r228571,
and thus ifconfig(8) needs recompile, it is a good chance to make
parameter checks on SIOCAIFADDR arguments more strict.


# 08b68b0e 15-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


# 55174c34 12-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Belatedly catch up with r151555. in_scrubprefix() also needs this fix. We
should compare not only addresses, but their masks, too, when searching
for matching prefix.


# f769e5b0 06-Dec-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Fix a very special case when SIOCAIFADDR supplies mask of 0.0.0.0,
don't overwrite the mask with autoguessing based on classes.


# 89b93255 28-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Fix one more fallout from r227791: do not overwrite trimmed sa_len
on the ia_sockmask when doing SIOCSIFNETMASK.

Reported by: Stefan Bethke <stb lassitu.de>, gonzo
Pointy hat to: glebius


# c6e5c711 24-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Remove superfluous check: SIOCAIFADDR must have ifra_addr supplied.


# bd47ae58 24-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Fix stupid typo in r227830.

PR: 162806
Pointy hat to: glebius


# e278f44b 22-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

style(9) nit


# bbaa3f94 22-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Fix SIOCDIFADDR semantics: if no address is specified, then delete first one.


# cf00e5c6 21-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

This check isn't needed now, sanity checking done in the beginning.
Missed it in last commit.


# 6d00fd9c 21-Nov-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Historically in_control() did not check sockaddrs supplied with
structs ifreq/in_aliasreq and there've been several panics due
to that problem. All these panics were fixed just a couple of
lines above the panicing code.

Take a more general approach: sanity check sockaddrs supplied
with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the
function and drop all checks below.

One check is now disabled due to strange code in ifconfig(8)
that I've removed recently. I'm going to enable it with next
__FreeBSD_version bump.

Historically in_ifinit() was able to recover from an error
and restore old address. Nowadays this feature isn't working
for all error cases, but for some of them. I suppose no software
relies on this behavior, so I'd like to remove it, since this
simplifies code a lot.

Also, move if_scrub() earlier in the in_ifinit(). It is more
correct to wipe routes before removing address from local
address list, and interface address list.

Silence from: bz, brooks, andre, rwatson, 3 weeks


# b3664a14 24-Oct-2011 Qing Li <qingli@FreeBSD.org>

Exclude host routes when checking for prefix coverage on multiple
interfaces. A host route has a NULL mask so check for that condition.
I have also been told by developers who customize the packet output
path with direct manipulation of the route entry (or the outgoing
interface to be specific). This patch checks for the route mask
explicitly to make sure custom code will not panic.

PR: kern/161805
MFC after: 3 days


# 53883e0c 15-Oct-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Add support for IPv4 /31 prefixes, as described in RFC3021.

To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.


# b365d954 15-Oct-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Remove last remnants of classful addressing:

- Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr.
- Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011.
- fix bug when we were not forwarding to a host which matches classful
net address. For example router having 192.168.x.y/16 network attached,
would not forward traffic to 192.168.*.0, which are legal IPs in
CIDR world.
- For compatibility, leave autoguessing of mask based on class.

Reviewed by: andre, bz, rwatson


# a0b5928b 13-Oct-2011 Gleb Smirnoff <glebius@FreeBSD.org>

De-spl(9).


# 15d25219 10-Oct-2011 Qing Li <qingli@FreeBSD.org>

All indirect routes will fail the rtcheck, except for a special host
route where the destination IP and the gateway IP is the same. This
special case handling is only meant for backward compatibility reason.
The last commit introduced a bug in the route check logic, where a
valid special case is treated as an error. This patch fixes that bug
along with some code cleanup.

Suggested by: gleb
Reviewed by: kmacy, discussed with gleb
MFC after: 1 day


# 6703e7ea 07-Oct-2011 Qing Li <qingli@FreeBSD.org>

Do not try removing an ARP entry associated with a given interface
address if that interface does not support ARP. Otherwise the
system will generate error messages unnecessarily due to the missing
entry.

PR: kern/159602
Submitted by: pluknet
MFC after: 3 days


# 41b210c6 07-Oct-2011 Qing Li <qingli@FreeBSD.org>

Remove the reference held on the loopback route when the interface
address is being deleted. Only the last reference holder deletes the
loopback route. All other delete operations just clear the IFA_RTSELF
flag.

PR: kern/159601
Submitted by: pluknet
Reviewed by: discussed on net@
MFC after: 3 days


# db92413e 03-Oct-2011 Qing Li <qingli@FreeBSD.org>

A system may have multiple physical interfaces, all of which are on the
same prefix. Since a single route entry is installed for the prefix
(without RADIX_MPATH), incoming packets on the interfaces that are not
associated with the prefix route may trigger an error message about
unable to allocation LLE entry, and fails L2. This patch makes sure a
valid route is present in the system, and allow the aforementioned
condition to exist and treats as valid.

Reviewed by: bz
MFC after: 5 days


# 6cf8e330 03-Oct-2011 Qing Li <qingli@FreeBSD.org>

This patch allows ARP to work properly in the presence of
self-referencing routes. This patch is a rework of r223862.

Reviewed by: bz, zec
MFC after: 5 days


# 11845098 27-Aug-2011 Qing Li <qingli@FreeBSD.org>

When an interface address route is removed from the system, another
route with the same prefix is searched for as a replacement. The
current code did not bypass routes that have non-operational
interfaces. This patch fixes that bug and will find a replacement
route with an active interface.

PR: kern/159603
Submitted by: pluknet, ambrisko at ambrisko dot com
Reviewed by: discussed on net@
Approved by: re (bz)
MFC after: 3 days


# 72366606 10-Aug-2011 Kevin Lo <kevlo@FreeBSD.org>

If RTF_HOST flag is specified, then we are interested in destination
address.

PR: kern/159600
Submitted by: Svatopluk Kraus <onwahe at gmail dot com>
Approved by: re (hrs)


# 13e255fa 08-Jul-2011 Marko Zec <zec@FreeBSD.org>

Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address. This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after: 3 days


# 92322284 28-May-2011 Qing Li <qingli@FreeBSD.org>

Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface
address so that proper clean up will take place in the routing code.
This patch fixes the bootp panic on startup problem. Also, added more
error handling and logging code in function in_scrubprefix().

MFC after: 5 days


# 5b84dc78 20-May-2011 Qing Li <qingli@FreeBSD.org>

The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by: delphij
MFC after: 5 days


# 79d51435 21-Mar-2011 Sergey Kandaurov <pluknet@FreeBSD.org>

Reference ifaddr object before unlocking as it can be freed
from another context at the moment of later access.

PR: kern/155555
Submitted by: Andrew Boyer <aboyer att averesystems.com>
Approved by: avg (mentor)
MFC after: 2 weeks


# a98c06f1 30-Nov-2010 Gleb Smirnoff <glebius@FreeBSD.org>

Use time_uptime instead of non-monotonic time_second to drive ARP
timeouts.

Suggested by: bde


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 31c6a003 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# e162ea60 12-Nov-2010 George V. Neville-Neil <gnn@FreeBSD.org>

Add a queue to hold packets while we await an ARP reply.

When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by: bz, rpaulo
MFC after: 3 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 12112cf6 16-Oct-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MfP4 CH182763 (original version):

Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.

(*) It is believed that with the current code and locking strategy we
cannot completely fix all race.

Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by: Nima Misaghian (nima_misa hotmail.com) (original version)
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version)
MFC after: 1 week


# 42db1b87 04-Sep-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

In case of RADIX_MPATH do not leak the IN_IFADDR read lock on
early return.

MFC after: 3 days


# 54bfbd51 10-Aug-2010 Will Andrews <will@FreeBSD.org>

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# dd62f5c0 25-Jun-2010 Qing Li <qingli@FreeBSD.org>

MFC r208553

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

Approved by: re (bz)


# 0ed6142b 25-May-2010 Qing Li <qingli@FreeBSD.org>

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# feb3a5f7 21-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r206481:

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing the entire tables make sure that in case we
cancel a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


# becba438 11-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


# c951da56 01-Apr-2010 Qing Li <qingli@FreeBSD.org>

MFC 204902

One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.


# c7ea0aa6 08-Mar-2010 Qing Li <qingli@FreeBSD.org>

One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after: 5 days


# 613e96b8 09-Feb-2010 Qing Li <qingli@FreeBSD.org>

MFC r203401

Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.


# d577d18a 02-Feb-2010 Qing Li <qingli@FreeBSD.org>

Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.

Verified by: avg
MFC after: 5 days


# 646c8005 08-Jan-2010 Qing Li <qingli@FreeBSD.org>

Ensure an address is removed from the interface address
list when the installation of that address fails.

PR: 139559


# a17a2dca 05-Jan-2010 Qing Li <qingli@FreeBSD.org>

MFC r201285

Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.


# 32c53401 05-Jan-2010 Qing Li <qingli@FreeBSD.org>

MFC r201282, r201543

r201282
-------
The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

r201543
-------
The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.

PR: ports/141342, kern/141134


# ccbb9c35 30-Dec-2009 Qing Li <qingli@FreeBSD.org>

Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.

MFC after: 5 days


# c7ab6602 30-Dec-2009 Qing Li <qingli@FreeBSD.org>

The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days


# f60909e3 28-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC r198418

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru


# 6cb2b4e7 23-Oct-2009 Qing Li <qingli@FreeBSD.org>

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru
MFC after: 3 days


# 6f99a646 20-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC r198111

This patch fixes the following issues in the ARP operation:

1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.
4. The return error code was incorrect.

Reviewed by: kmacy
Approved by: re


# 93704ac5 15-Oct-2009 Qing Li <qingli@FreeBSD.org>

This patch fixes the following issues in the ARP operation:

1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.

Reviewed by: kmacy
MFC after: 3 days


# c8c92b54 06-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC r197696

Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

Approved by: re


# 7ec99f71 06-Oct-2009 Qing Li <qingli@FreeBSD.org>

MFC 197695

Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR: kern/139113
Reviewed by: bz
Approved by: re


# b4a22c36 01-Oct-2009 Qing Li <qingli@FreeBSD.org>

Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

MFC after: immediately


# fa3cfd39 01-Oct-2009 Qing Li <qingli@FreeBSD.org>

Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR: kern/139113
MFC after: 3 days


# 553a7dec 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r197227

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
Approved by: re


# 77eb2069 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r197210, 197212, 197235

The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

r197235 reverts file nfs_vfsops.c

Reviewed by: kmacy
Approved by: re


# 6d8337ba 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r196714

This patch fixes the following issues:

- Routing messages are not generated when adding and removing
interface address aliases.
- Loopback route installed for an interface address alias is
not deleted from the routing table when that address alias
is removed from the associated interface.
- Function in_ifscrub() is called extraneously.

Reviewed by: gnn, kmacy, sam
Approved by: re


# 9bb7d0f4 15-Sep-2009 Qing Li <qingli@FreeBSD.org>

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


# 96ed1732 14-Sep-2009 Qing Li <qingli@FreeBSD.org>

The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

Reviewed by: kmacy


# 9a311445 08-Sep-2009 Navdeep Parhar <np@FreeBSD.org>

Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by: gnn, kmacy
Approved by: gnn (mentor)
MFC after: 1 month


# 1bf38b12 31-Aug-2009 Qing Li <qingli@FreeBSD.org>

This patch fixes the following issues:

- Routing messages are not generated when adding and removing
interface address aliases.
- Loopback route installed for an interface address alias is
not deleted from the routing table when that address alias
is removed from the associated interface.
- Function in_ifscrub() is called extraneously.

Reviewed by: gnn, kmacy, sam
MFC after: 3 days


# a0021692 28-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge r196535 from head to stable/8:

Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy, qingli

Approved by: re (kib)


# 3ef94f2b 28-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge r196481 from head to stable/8:

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian

Approved by: re (kib)


# dc56e98f 25-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy
MFC after: 3 days


# 77dfcdc4 23-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# df813b7e 27-Jul-2009 Qing Li <qingli@FreeBSD.org>

This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 2d9cfaba 25-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


# 8c0fec80 23-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# 1099f828 21-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


# 8d8bc018 08-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# f81a8a32 22-May-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

If including vnet.h one has to include opt_route.h as well. This is
because struct vnet_net holds the rt_tables[][] for MRT and array size
is compile time dependent. If you had ROUTETABLES set to >1 after
r192011 V_loif was pointing into nonsense leading to strange results
or even panics for some people.

Reviewed by: mz


# c9d763bf 20-May-2009 Qing Li <qingli@FreeBSD.org>

When an interface address is removed and the last prefix
route is also being deleted, the link-layer address table
(arp or nd6) will flush those L2 llinfo entries that match
the removed prefix.

Reviewed by: kmacy


# 1600c117 17-May-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Unbreak options VIMAGE builds, in a followup to r192011 which did not
introduce INIT_VNET_NET() initializers necessary for accessing V_loif.

Submitted by: zec
Reviewed by: julian


# 92fac994 13-May-2009 Qing Li <qingli@FreeBSD.org>

Ignore the INADDR_ANY address inserted/deleted by DHCP when installing a loopback route
to the interface address.


# ebc90701 12-May-2009 Qing Li <qingli@FreeBSD.org>

This patch adds a host route to an interface address (that is assigned
to a non loopback/ppp link types) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route is implicitly added by the L2
code during RTM_RESOLVE of that interface address. This host route is
deleted when that interface is removed.

Reviewed by: kmacy


# 093f25f8 26-Apr-2009 Marko Zec <zec@FreeBSD.org>

In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)


# 588885f2 25-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial
lookup of 'ia' from if_addrhead through most use. Note that we
currently have to drop it prematurely in some cases due to calls out to
the routing and interface code while using 'ia', but this closes many
races. Annotate several potential races that persist after this change.
Move to using M_NOWAIT for allocating new interface addresses due to
lock(s) being held.

MFC after: 3 weeks


# 07cde5e9 24-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

In in_purgemaddrs(), remove the inm being freed from the address list
before freeing it, rather than vice version, to avoid potential use
after free.

Reviewed by: bms


# cf7b18f1 24-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Relocate permissions checking code in in_control() to before the body
of the implementation of ioctls. This makes the mapping of ioctls to
specific privileges more explicit, and also simplifies the
implementation by reducing the use of FALLTHROUGH handling in switch.

While this is not intended to be a functional change, it does mean
that certain privilege checks are now performed earlier, so EPERM
might be returned in preference to EADDRNOTAVAIL for management
ioctls that could have failed for both reasons.

MFC after: 3 weeks


# bbb3fb61 23-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Reorganize in_control() so that invariants are more obvious, and so
that it is easier to lock:

- Handle the unsupported ioctl case at the beginning of in_control(),
handing off to ifp->if_ioctl, rather than looking up interfaces and
addresses unnecessarily in this case.

- Make it an invariant that ifp is always non-NULL when running
in_control()-implemented ioctls, simplifying the code structure.

MFC after: 3 weeks


# 8021456a 19-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Protect against some writer-writer races in in_control() by acquiring
the interface address list lock around interface address list
modifications. More to do here.

MFC after: 2 weeks


# 56663a40 17-Mar-2009 Bruce M Simpson <bms@FreeBSD.org>

Deal with the case where ifma_protospec may be NULL, during
any IPv4 multicast operations which reference it.

There is a potential race because ifma_protospec is set to NULL
when we discover the underlying ifnet has gone away. This write
is not covered by the IF_ADDR_LOCK, and it's difficult to widen
its scope without making it a recursive lock. It isn't clear why
this manifests more quickly with 802.11 interfaces, but does not
seem to manifest at all with wired interfaces.

With this change, the 802.11 related panics reported by sam@
and cokane@ should go away. It is not the right fix, that requires
more thought before 8.0.

Idea from: sam
Tested by: cokane


# e5adda3d 15-Mar-2009 Robert Watson <rwatson@FreeBSD.org>

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


# c75aa354 09-Mar-2009 Bruce M Simpson <bms@FreeBSD.org>

Fix uninitialized use of ifp for ii.

Found by: Peter Holm


# d10910e6 09-Mar-2009 Bruce M Simpson <bms@FreeBSD.org>

Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.


# b89e82dd 05-Feb-2009 Jamie Gritton <jamie@FreeBSD.org>

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


# cbd18445 18-Jan-2009 Sam Leffler <sam@FreeBSD.org>

remove too noisy DIAGNOSTIC code

Reviewed by: qingli


# 813dd6ae 09-Jan-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR: kern/68189
Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with: rwatson [2]
Reviewed by: rwatson
MFC after: 4 weeks


# 5ce0eb7f 09-Jan-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related
jail-aware. Up to now we returned the first address of the interface
for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for
programs querying for an address but running inside a jail, as the
address returned usually did not belong to the jail.
Like for v6, if there was an ifr_addr given on v4, you could probe
for more addresses on the interfaces that you were not allowed to see
from inside a jail. Return an error (EADDRNOTAVAIL) in that case
now unless the address is on the given interface and valid for the
jail.

PR: kern/114325
Reviewed by: rwatson
MFC after: 4 weeks


# c0e9a8a1 09-Jan-2009 Hartmut Brandt <harti@FreeBSD.org>

Set a minimum of information in the routing message (like version and type)
so that generic routing message parsing code can parse the messages for
L2 info that are retrieved via the sysctl interface.


# dc495497 02-Jan-2009 Qing Li <qingli@FreeBSD.org>

Some modules such as SCTP supplies a valid route entry as an input argument
to ip_output(). The destionation is represented in a sockaddr{} object
that may contain other pieces of information, e.g., port number. This
same destination sockaddr{} object may be passed into L2 code, which
could be used to create a L2 entry. Since there exists a L2 table per
address family, the L2 lookup function can make address family specific
comparison instead of the generic bcmp() operation over the entire
sockaddr{} structure.

Note in the IPv6 case the sin6_scope_id is not compared because the
address is currently stored in the embedded form inside the kernel.
The in6_lltable_lookup() has to account for the scope-id if this
storage format were to change in the future.


# 42d866dd 28-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

For consistency use LLE_IS_VALID() in this 4th place that is actually
interested in the (void *)-1 return value hack.
This way we can easily identify those special parts of the code.


# 8eca593c 26-Dec-2008 Qing Li <qingli@FreeBSD.org>

This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
option "-iface". During if_attach(), an sockaddr_dl{} entry is created
for the interface and is part of the interface address list. This
sockaddr_dl{} entry describes the interface in detail. The "route"
command selects this entry as the "gateway" object when the "-iface"
option is present. The "arp" and "ndp" commands also interact with the
kernel through the routing socket when adding and removing static L2
entries. The static L2 information is also provided through the
"gateway" object with an AF_LINK family type, similar to what is
provided by the "route" command. In order to differentiate between
these two types of operations, a RTF_LLDATA flag is introduced. This
flag is set by the "arp" and "ndp" commands when issuing the add and
delete commands. This flag is also set in each L2 entry returned by the
kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
in the fields for a "rtm" object, which is reinjected into the kernel by
a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
is a prefix route, so the RTF_LLDATA flag must be specified when issuing
the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
specification for retrieving L2 information. Also optimized the
code logic.

Reviewed by: julian


# fbc2ca1b 15-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

unlock and destroy an llentry's lock before freeing

Found by: sam


# 6e6b3f7c 14-Dec-2008 Qing Li <qingli@FreeBSD.org>

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# f02493cb 28-Nov-2008 Marko Zec <zec@FreeBSD.org>

Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 44e33a07 19-Nov-2008 Marko Zec <zec@FreeBSD.org>

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 460473a0 26-Oct-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Style changes only:
- Consistently add parentheses to return statements.
- Use NULL instead of 0 when comparing pointers, also avoiding
unnecessary casts.
- Do not use pointers as booleans.

Reviewed by: rwatson (earlier version)
MFC after: 2 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# cf77b848 24-Jun-2008 Oleksandr Tymoshenko <gonzo@FreeBSD.org>

In case of interface initialization failure remove struct in_ifaddr* from
in_ifaddrhashtbl in in_ifinit because error handler in in_control removes
entries only for AF_INET addresses. If in_ifinit is called for the cloned
inteface that has just been created its address family is not AF_INET and
therefor LIST_REMOVE is not called for respective LIST_INSERT_HEAD and
freed entries remain in in_ifaddrhashtbl and lead to memory corruption.

PR: kern/124384


# 107d1244 24-Jan-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Differentiate between addifaddr and delifaddr for the privilege check.

Reviewed by: rwatson
MFC after: 2 weeks


# 4b421e2d 07-Oct-2007 Mike Silbersack <silby@FreeBSD.org>

Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by: re (kensmith)


# fbdd20a1 16-Jun-2007 Matt Jacob <mjacob@FreeBSD.org>

Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr
to nonzero you fulfill the same function as the variable 'cmp'. so you
might as well zero match and test against it later.

Reviewed by: timeout on review request


# 71498f30 12-Jun-2007 Bruce M Simpson <bms@FreeBSD.org>

Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)


# f2565d68 10-May-2007 Robert Watson <rwatson@FreeBSD.org>

Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.


# f7e083af 29-Mar-2007 Bruce M Simpson <bms@FreeBSD.org>

Fix a bug in IPv4 address configuration exposed by refcounting.
* Join the IPv4 all-hosts multicast group 224.0.0.1 once only;
that is, when an IPv4 address is first configured on an interface.
* Do not join it for subsequent IPv4 addresses as this violates IGMP.
* Be sure to leave the group when all IPv4 addresses have been removed
from the interface.
* Add two DIAGNOSTIC printfs related to the issue.

Further care and attention is needed in this area; it is suggested that
netinet's attachment to the ifnet structure be compartmentalized and
non-implicit.

Bug found by: andre
MFC after: 1 month


# ec002fee 19-Mar-2007 Bruce M Simpson <bms@FreeBSD.org>

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


# f8429ca2 02-Feb-2007 Bruce M Simpson <bms@FreeBSD.org>

In regular forwarding path, reject packets destined for 169.254.0.0/16
link-local addresses. See RFC 3927 section 2.7.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# d9668414 28-Sep-2006 Bruce M Simpson <bms@FreeBSD.org>

The IPv4 code should clean up multicast group state when an interface
goes away. Without this change, it leaks in_multi (and often ether_multi
state) if many clonable interfaces are created and destroyed in quick
succession.

The concept of this fix is borrowed from KAME. Detailed information about
this behaviour, as well as test cases, are available in the PR.

PR: kern/78227
MFC after: 1 week


# 31343a3d 24-Jan-2006 Andre Oppermann <andre@FreeBSD.org>

In in_control() remove the temporary in_ifaddr structure from the
ia_hash only if it actually is an AF_INET address. All other places
test for sa_family == AF_INET but this one.

PR: kern/92091
Submitted by: Seth Kingsley <sethk-at-meowfishies.com>
MFC after: 3 days


# f3d30eb2 28-Oct-2005 Gleb Smirnoff <glebius@FreeBSD.org>

First fill in structure with valid values, and only then attach it
to the global list.

Reviewed by: rwatson


# bfb26eec 22-Oct-2005 Gleb Smirnoff <glebius@FreeBSD.org>

In in_addprefix() compare not only route addresses, but their masks,
too. This fixes problem when connected prefixes overlap.

Obtained from: OpenBSD (rev. 1.40 by claudio);
[ I came to this fix myself, and then found out that
OpenBSD had already fixed it the same way.]


# c48b03fb 03-Oct-2005 Robert Watson <rwatson@FreeBSD.org>

Unlock Giant symmetrically with respect to lock acquire order as that's
generally nicer.

Spotted by: johan
MFC after: 1 week


# 1fa9efef 03-Oct-2005 Robert Watson <rwatson@FreeBSD.org>

Acquire Giant conditionally in in_addmulti() and in_delmulti() based on
whether the interface being accessed is IFF_NEEDSGIANT or not. This
avoids lock order reversals when calling into the interface ioctl
handler, which could potentially lead to deadlock.

The long term solution is to eliminate non-MPSAFE network drivers.

Discussed with: jhb
MFC after: 1 week


# b1c53bc9 18-Sep-2005 Robert Watson <rwatson@FreeBSD.org>

Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after: 3 days


# 1ae95409 18-Aug-2005 Gleb Smirnoff <glebius@FreeBSD.org>

In order to support CARP interfaces kernel was taught to handle more
than one interface in one subnet. However, some userland apps rely on
the believe that this configuration is impossible.

Add a sysctl switch net.inet.ip.same_prefix_carp_only. If the switch
is on, then kernel will refuse to add an additional interface to
already connected subnet unless the interface is CARP. Default
value is off.

PR: bin/82306
In collaboration with: mlaier


# dd5a318b 03-Aug-2005 Robert Watson <rwatson@FreeBSD.org>

Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros. For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 10 days


# ba5da2a0 01-Jun-2005 Ian Dowse <iedowse@FreeBSD.org>

Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.


# d4d22970 20-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

ifma_protospec is a pointer. Use NULL when assigning or compating it.


# 50bb1704 20-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Remove a workaround from previos revision. It proved to be incorrect.
Add two another workarounds for carp(4) interfaces:
- do not add connected route when address is assigned to carp(4) interface
- do not add connected route when other interface goes down

Embrace workarounds with #ifdef DEV_CARP


# 0504a89f 10-Mar-2005 Gleb Smirnoff <glebius@FreeBSD.org>

Add antifootshooting workaround, which will make all routes "connected"
to carp(4) interfaces host routes. This prevents a problem, when connected
network is routed to carp(4) interface.


# c398230b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 9a6a6eeb 17-Nov-2004 Max Laier <mlaier@FreeBSD.org>

Fix host route addition for more than one address to a loopback interface
after allowing more than one address with the same prefix.

Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru>
Submitted by: ru (also NetBSD rev. 1.83)
Pointyhat to: mlaier


# 81d96ce8 13-Nov-2004 Max Laier <mlaier@FreeBSD.org>

Merge copyright notices.

Requested by: njl


# 48321abe 12-Nov-2004 Max Laier <mlaier@FreeBSD.org>

Change the way we automatically add prefix routes when adding a new address.
This makes it possible to have more than one address with the same prefix.
The first address added is used for the route. On deletion of an address
with IFA_ROUTE set, we try to find a "fallback" address and hand over the
route if possible.
I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to
in_ifscrub as it must be considered KAPI as it is not static in in.c. I will
clean this after the MFC.

Discussed on: arch, net
Tested by: many testers of the CARP patches
Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it>
Obtained from: WIDE via OpenBSD
MFC after: 1 month


# a4f757cd 16-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

White space cleanup for netinet before branch:

- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by: re (scottl)
Submitted by: Xin LI <delphij@frontfree.net>


# 2eccc90b 11-Aug-2004 Andre Oppermann <andre@FreeBSD.org>

Add the function in_localip() which returns 1 if an internet address is for
the local host and configured on one of its interfaces.


# f36cfd49 07-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 25a4adce 25-Feb-2004 Max Laier <mlaier@FreeBSD.org>

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


# 3b95e134 30-Dec-2003 Ruslan Ermilov <ru@FreeBSD.org>

Document the net.inet.ip.subnets_are_local sysctl.


# 9ce78778 02-Nov-2003 Sam Leffler <sam@FreeBSD.org>

Correct rev 1.56 which (incorrectly) reversed the test used to
decide if in_pcbpurgeif0 should be invoked.

Supported by: FreeBSD Foundation


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 19fc74fb 18-Dec-2002 Jeffrey Hsu <hsu@FreeBSD.org>

Lock up ifaddr reference counts.


# 11aee0b4 17-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused and incorrectly maintained variable "in_interfaces"


# 2754d95d 22-Oct-2002 SUZUKI Shinsuke <suz@FreeBSD.org>

fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4"
MFC after: 1 week


# f76fcf6d 10-Jun-2002 Jeffrey Hsu <hsu@FreeBSD.org>

Lock up inpcb.

Submitted by: Jennifer Yang <yangjihui@yahoo.com>


# 6ce6e2be 09-Apr-2002 Brian Somers <brian@FreeBSD.org>

Remove the code that masks an EEXIST returned from rtinit() when
calling ioctl(SIOC[AS]IFADDR).

This allows the following:

ifconfig xx0 inet 1.2.3.1 netmask 0xffffff00
ifconfig xx0 inet 1.2.3.17 netmask 0xfffffff0 alias
ifconfig xx0 inet 1.2.3.25 netmask 0xfffffff8 alias
ifconfig xx0 inet 1.2.3.26 netmask 0xffffffff alias

but would (given the above) reject this:

ifconfig xx0 inet 1.2.3.27 netmask 0xfffffff8 alias

due to the conflicting netmasks. I would assert that it's wrong
to mask the EEXIST returned from rtinit() as in the above scenario, the
deletion of the 1.2.3.25 address will leave the 1.2.3.27 address
as unroutable as it was in the first place.

Offered for review on: -arch, -net
Discussed with: stephen macmanus <stephenm@bayarea.net>
MFC after: 3 weeks


# 5a43847d 09-Apr-2002 Brian Somers <brian@FreeBSD.org>

Don't add host routes for interface addresses of 0.0.0.0/8 -> 0.255.255.255.

This change allows bootp to work with more than one interface, at the
expense of some rather ``wrong'' looking code. I plan to MFC this in
place of luigi's recent #ifdef BOOTP stuff that was committed to this
file in -stable, as that's slightly more wrong that this is.

Offered for review on: -arch, -net
MFC after: 2 weeks


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 0f02fdac 30-Nov-2001 Brian Somers <brian@FreeBSD.org>

During SIOCAIFADDR, if in_ifinit() fails and we've already added an
interface address, blow the address away again before returning the
error.

In in_ifinit(), if we get an error from rtinit() and we've also got
a destination address, return the error rather than masking EEXISTS.
Failing to create a host route when configuring an interface should
be treated as an error.


# bc183b3f 30-Oct-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Make sure the netmask always has an address family. This fixes Linux
ifconfig, which expects the address returned by the SIOCGIFNETMASK ioctl
to have a valid sa_family. Similar changes may be necessary for IPv6.

While we're here, get rid of an unnecessary temp variable.

MFC after: 2 weeks


# 22c819a7 01-Oct-2001 Jonathan Lemon <jlemon@FreeBSD.org>

in_ifinit apparently can be used to rewrite an ip address; recalculate
the correct hash bucket for the entry.

Submitted by: iedowse (with some munging by me)


# ca925d9c 28-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add a hash table that contains the list of internet addresses, and use
this in place of the in_ifaddr list when appropriate. This improves
performance on hosts which have a large number of IP aliases.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# f9132ceb 05-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# e43cc4ae 04-Aug-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

When running aplication joined multicast address,
removing network card, and kill aplication.
imo_membership[].inm_ifp refer interface pointer
after removing interface.
When kill aplication, release socket,and imo_membership.
imo_membership use already not exist interface pointer.
Then, kernel panic.

PR: 29345
Submitted by: Inoue Yuichi <inoue@nd.net.fujitsu.co.jp>
Obtained from: KAME
MFC after: 3 days


# 33841545 10-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 91854268 11-May-2001 Ruslan Ermilov <ru@FreeBSD.org>

In in_ifadown(), differentiate between whether the interface goes
down or interface address is deleted. Only delete static routes
in the latter case.

Reported by: Alexander Leidinger <Alexander@leidinger.net>


# 462b86fe 16-Mar-2001 Poul-Henning Kamp <phk@FreeBSD.org>

<sys/queue.h> makeover.


# 089cdfad 15-Mar-2001 Ruslan Ermilov <ru@FreeBSD.org>

net/route.c:

A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.

Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.

This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.

netinet/in.c, netinet/in_rmx.c:

When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)

Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.

PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)


# 37d40066 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


# fc2ffbe6 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


# 7cc0979f 08-Dec-2000 David Malone <dwmalone@FreeBSD.org>

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# cf9fa8e7 29-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


# 4153a3a3 19-Aug-2000 Bruce Evans <bde@FreeBSD.org>

Fixed a missing splx() in if_addmulti(). Was broken in rev.1.28.


# 686cdd19 04-Jul-2000 Jun-ichiro itojun Hagino <itojun@FreeBSD.org>

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


# 5d60ed0e 13-Jan-2000 Yoshinobu Inoue <shin@FreeBSD.org>

Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

s/__ss_family/ss_family/
s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.


# 6a800098 22-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 88a5354e 23-Apr-1999 Luigi Rizzo <luigi@FreeBSD.org>

postpone the sending of IGMP LEAVE msg to after deleting the
mc address from the address list. The latter operation on some
hardware resets the card, potentially canceling the pending LEAVE
pkt.


# 6572231d 06-Dec-1998 Eivind Eklund <eivind@FreeBSD.org>

Clean up some pointer usage.


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 55166637 11-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 1fd0b058 02-Aug-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# a29f300e 27-Apr-1997 Garrett Wollman <wollman@FreeBSD.org>

The long-awaited mega-massive-network-code- cleanup. Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.


# 51a53488 24-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Don't include <sys/ioctl.h> in the kernel. Stage 2: include
<sys/sockio.h> instead of <sys/ioctl.h> in network files.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 117bcae7 18-Feb-1997 Garrett Wollman <wollman@FreeBSD.org>

Convert raw IP from mondo-switch-statement-from-Hell to
pr_usrreqs. Collapse duplicates with udp_usrreq.c and
tcp_usrreq.c (calling the generic routines in uipc_socket2.c and
in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached
socket now traps, rather than harmlessly returning an error; this
should never happen. Allow the raw IP buffer sizes to be
controlled via sysctl.


# 39191c8e 13-Feb-1997 Garrett Wollman <wollman@FreeBSD.org>

Provide PRC_IFDOWN and PRC_IFUP support for IP. Now, when an interface
is administratively downed, all routes to that interface (including the
interface route itself) which are not static will be deleted. When
it comes back up, and addresses remaining will have their interface routes
re-added. This solves the problem where, for example, an Ethernet interface
is downed by traffic continues to flow by way of ARP entries.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 477180fb 13-Jan-1997 Garrett Wollman <wollman@FreeBSD.org>

Use the new if_multiaddrs list for multicast addresses rather than the
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).


# ee3c980d 15-Dec-1996 Garrett Wollman <wollman@FreeBSD.org>

Some days, it just doesn't pay to get out of bed. Fix another broken
reference to the now-dead-for-real-this-time ia_next field.

Reminded by: Russell Vincent


# 59562606 13-Dec-1996 Garrett Wollman <wollman@FreeBSD.org>

Convert the interface address and IP interface address structures
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).


# f8731310 09-Sep-1996 Garrett Wollman <wollman@FreeBSD.org>

Set subnetsarelocal to false. In a classless world, the other case
is almost never useful. (This is only a quick hack; someone should
go back and delete the entire subnetsarelocal==1 code path.)


# c655b7c4 06-Apr-1996 David Greenman <dg@FreeBSD.org>

Added proper splnet protection while modifying the interface address list.
This fixes a panic that occurs when ifconfig ioctl(s) were interrupted
by IP traffic at the wrong time - resulting in a NULL pointer dereference.
This was originally noticed on a FreeBSD 1.0 system, but the problem still
exists in current sources.


# ac0aa473 15-Mar-1996 Bill Fenner <fenner@FreeBSD.org>

Allow SIOCGIFBRDADDR and SIOCGIFNETMASK to return information about
aliases, if the alias address was passed in the struct ifreq.
Default to first address on the list, for backwards compatibility.


# 2ee45d7d 11-Mar-1996 David Greenman <dg@FreeBSD.org>

Move or add #include <queue.h> in preparation for upcoming struct socket
changes.


# 8dd27fd6 08-Jan-1996 Guido van Rooij <guido@FreeBSD.org>

Fix a bug where having a process listening to both a INADDR_ANY and a
local address, that was assigned with ifconfig alias and netmask
0xffffffff, would receive duplictae udp packets.
This behaviour can easily be seen by having named run, and using the alias
address as the name server.
This solution is not the pretiest one, but after talk with Garreth, it
is seen as the most easy one.


# f6d24a78 09-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize.


# dcc3cb75 19-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

fix #includes & warnings.


# 0312fbe9 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

New style sysctl & staticize alot of stuff.


# a98ca469 29-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


# 2180b925 21-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Merge with 4.4-Lite-2. This is actually a 64-bit fix; the second parameter
to in_control() is sometimes a pointer, and sometimes an integer, so use
u_long rather than int.

Obtained from: 4.4BSD-Lite-2


# efe4b0eb 21-Sep-1995 Garrett Wollman <wollman@FreeBSD.org>

Second try: get 4.4-Lite-2 into the source tree. The conflicts don't
matter because none of our working source files are on the CSRG branch
any more.

Obtained from: 4.4BSD-Lite-2


# 357b78a9 17-Jul-1995 Garrett Wollman <wollman@FreeBSD.org>

Return EDESTADDRREQ rather than EADDRNOTAVAIL if the user attempts to
half-configure a point-to-point interface.

Submitted by: Jonathan M. Bresler <jmb@kryten.atinc.com>


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# f5fea3dd 26-Apr-1995 Paul Traina <pst@FreeBSD.org>

Cleanup loopback interface support.
Reviewed by: wollman


# 1067217d 25-Apr-1995 Garrett Wollman <wollman@FreeBSD.org>

Disallow half-configured point-to-point interfaces. It's still possible to
get into a half-configured state by using the old-style ioctls;this
may be a feature.


# ffa5b11a 23-Mar-1995 Garrett Wollman <wollman@FreeBSD.org>

in_var.h: in_multi structures now form a queue(3)-style LIST structure
in.c: when an interface address is deleted, keep its multicast membership
. records (attached to a struct multi_kludge) for attachment to the
. next address on the same interface. Also, in_multi structures now
. gain a reference to the ifaddr so that they won't point off into
. freed memory if an interface goes away and doesn't come back before
. the last socket reference drops. This is analogous to how it is
. done for routes, and seems to make the most sense.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# c70f4510 13-Feb-1995 Poul-Henning Kamp <phk@FreeBSD.org>

YFfix.


# dd2e4102 22-Dec-1994 Garrett Wollman <wollman@FreeBSD.org>

Move ARP interface initialization into if_ether.c:arp_ifinit().


# df00058d 03-Nov-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix off-by-one error reported to NetBSD by Karl Fox in
<9411031449.AA11102@gefilte.MorningStar.Com>.


# 623ae52e 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:


# fe95e21f 15-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Made the kernel compile even without "ether".


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources