History log of /freebsd-10.1-release/sys/netinet/in.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 267195 06-Jun-2014 asomers

MFC changes related to PR kern/189089. Unlike CURRENT, stable/10 does not
panic when you attempt to remove the IP address. But it still fails to
remove the address.

MFC r265094

Add regression test for PR kern/189088.

MFC r265092

Fix a panic when removing an IP address from an interface, if the same
address exists on another interface. The panic was introduced by change
264887, which changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The
solution is to use the interface fib in that call. For the majority of
users, that will be equivalent to the legacy behavior.


# 267193 06-Jun-2014 asomers

MFC r264887

Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure

sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.


# 267186 06-Jun-2014 asomers

MFC changes relating to running multiple interfaces on different fibs but
with addresses on the same subnet.

MFC r266860

Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
Add legacy-compatible functions as described above. Ensure legacy
behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
Call with _fib() functions if we must use a specific fib, or the
legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
Improve the udp_dontroute test. The bug that this test exercises is
that ifa_ifwithnet() will return the wrong address, if multiple
interfaces have addresses on the same subnet but with different
fibs. The previous version of the test only considered one possible
failure mode: that ifa_ifwithnet_fib() might fail to find any
suitable address at all. The new version also checks whether
ifa_ifwithnet_fib() finds the correct address by checking where the
ARP request goes.

MFC r264917

Style fixes, mostly trailing whitespace elimination. No functional change.

MFC r264905

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.

sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.

sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.

Clear expected failure messages for kern/187550 and kern/187552.

MFC r263738

tests/sys/netinet/Makefile
tests/sys/netinet/fibs.sh
Replace fibs:udp_dontroute with fibs:src_addr_selection_by_subnet.
The original test was poorly written; it was actually testing
kern/167947 instead of the desired kern/187553. The root cause of the
bug is that ifa_ifwithnet did not have a fib argument. The new test
more directly targets that behavior.

tests/sys/netinet/udp_dontroute.c
Delete the auxilliary binary used by the old test


# 267175 06-Jun-2014 asomers

MFC r263779

Correct ARP update handling when the routes for network interfaces are
restricted to a single FIB in a multifib system.

Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.

sys/netinet/in.c:
When dealing with RTM_ADD (add route) requests for an interface, use
the interface's assigned FIB instead of the default (FIB 0).

sys/netinet/if_ether.c:
In arpresolve(), enhance error message generated when an
lla_lookup() fails so that the interface causing the error is
visible in logs.

tests/sys/netinet/fibs_test.sh
Clear ATF expected error.


# 265946 13-May-2014 kevlo

MFC r264212,r264213,r264248,r265776,r265811,r265909:

- Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
Tested with vlc and a test suite [1].
[1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

Reviewed by: jhb, glebius, adrian

- Fix a logic bug which prevented the sending of UDP packet with 0 checksum.

- Disable TX checksum offload for UDP-Lite completely. It wasn't used for
partial checksum coverage, but even for full checksum coverage it doesn't
work.


# 265717 08-May-2014 melifaro

Merge 260488, r260508.

r260488:
Split rt_newaddrmsg_fib() into two different functions.
Adding/deleting interface addresses involves access to 3 different subsystems,
int different parts of code. Each call can fail, so reporting successful
operation by rtsock in the middle of the process error-prone.

Further split routing notification API and actual rtsock calls via creating
public-available rt_addrmsg() / rt_routemsg() functions with "private"
rtsock_* backend.

r260508:
Simplify inet alias handling code: if we're adding/removing alias which
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

This eliminates the following rtsock messages:

Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).

Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

before commit, addition:

got message of size 116 on Fri Jan 10 14:13:15 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

got message of size 192 on Fri Jan 10 14:13:15 2014
RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff

after commit, addition:

got message of size 116 on Fri Jan 10 13:56:26 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

before commit, wihdrawal:

got message of size 192 on Fri Jan 10 13:58:59 2014
RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff

got message of size 116 on Fri Jan 10 13:58:59 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

adter commit, withdrawal:

got message of size 116 on Fri Jan 10 14:14:11 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).

I've tested this change with quagga (no change) and bird (*).

bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.

I'm going to MFC this change if there will be no complains about behavior
change.

While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).


# 260504 10-Jan-2014 ae

MFC r260151 (by adrian):
Use an RLOCK here instead of an RWLOCK - matching all the other calls
to lla_lookup().

This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.

MFC r260187:
lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

MFC r260217:
Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 253084 09-Jul-2013 ae

Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.


# 249742 21-Apr-2013 oleg

Plug static llentry leak (ipv4 & ipv6 were affected).

PR: kern/172985
MFC after: 1 month


# 246143 31-Jan-2013 glebius

Retire struct sockaddr_inarp.

Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by: ru, andre, net@


# 244989 03-Jan-2013 peter

Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.


# 244678 25-Dec-2012 glebius

The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.


# 244665 24-Dec-2012 glebius

Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.


# 239395 19-Aug-2012 rrs

Though I disagree, I conceed to jhb & Rui. Note
that we still have a problem with this whole structure of
locks and in_input.c [it does not lock which it should not, but
this *can* lead to crashes]. (I have seen it in our SQA
testbed.. besides the one with a refcnt issue that I will
have SQA work on next week ;-)


# 239353 17-Aug-2012 rrs

Ok jhb, lets move the ifa_free() down to the bottom to
assure that *all* tables and such are removed before
we start to free. This won't protect the Hash in ip_input.c
but in theory should protect any other uses that *do* use locks.

MFC after: 1 week (or more)


# 239334 16-Aug-2012 rrs

Its never a good idea to double free the same
address.

MFC after: 1 week (after the other commits ahead of this gets MFC'd)


# 238990 02-Aug-2012 glebius

Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>


# 238967 01-Aug-2012 glebius

Some more whitespace cleanup.


# 238945 31-Jul-2012 glebius

Some style(9) and whitespace changes.

Together with: Andrey Zonov <andrey zonov.org>


# 237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


# 234087 10-Apr-2012 glebius

M_DONTWAIT is a flag from historical mbuf(9)
allocator, not malloc(9) or uma(9) flag.


# 232054 23-Feb-2012 kmacy

When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after: 2 weeks


# 231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


# 230207 16-Jan-2012 glebius

Drop support for SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFBRDADDR, SIOCSIFDSTADDR
ioctl commands.

PR: 163524
Reviewed by: net


# 229621 05-Jan-2012 jhb

Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks


# 229478 04-Jan-2012 jhb

Use a helper variable to wrap a long line.


# 229477 04-Jan-2012 jhb

In the handling of the SIOC[DG]LIFADDR icotls in in_lifaddr_ioctl(), add
missing interface address list locking and grab a reference on the
matching interface address after dropping the lock while it is used to
avoid a potential use after free.

Reviewed by: bz
MFC after: 1 week


# 229476 04-Jan-2012 jhb

Fix the SIOC[DG]LIFADDR ioctls in in_lifaddr_ioctl() to work with IPv4
interface address rather than IPv6.

Submitted by: hrs
Reviewed by: bz
MFC after: 1 week


# 228768 21-Dec-2011 glebius

Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by: brooks


# 228574 16-Dec-2011 glebius

Since size of struct in_aliasreq has just been changed in r228571,
and thus ifconfig(8) needs recompile, it is a good chance to make
parameter checks on SIOCAIFADDR arguments more strict.


# 228571 16-Dec-2011 glebius

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


# 228454 13-Dec-2011 glebius

Belatedly catch up with r151555. in_scrubprefix() also needs this fix. We
should compare not only addresses, but their masks, too, when searching
for matching prefix.


# 228313 06-Dec-2011 glebius

Fix a very special case when SIOCAIFADDR supplies mask of 0.0.0.0,
don't overwrite the mask with autoguessing based on classes.


# 228062 28-Nov-2011 glebius

Fix one more fallout from r227791: do not overwrite trimmed sa_len
on the ia_sockmask when doing SIOCSIFNETMASK.

Reported by: Stefan Bethke <stb lassitu.de>, gonzo
Pointy hat to: glebius


# 227959 24-Nov-2011 glebius

Remove superfluous check: SIOCAIFADDR must have ifra_addr supplied.


# 227958 24-Nov-2011 glebius

Fix stupid typo in r227830.

PR: 162806
Pointy hat to: glebius


# 227831 22-Nov-2011 glebius

style(9) nit


# 227830 22-Nov-2011 glebius

Fix SIOCDIFADDR semantics: if no address is specified, then delete first one.


# 227801 21-Nov-2011 glebius

This check isn't needed now, sanity checking done in the beginning.
Missed it in last commit.


# 227791 21-Nov-2011 glebius

Historically in_control() did not check sockaddrs supplied with
structs ifreq/in_aliasreq and there've been several panics due
to that problem. All these panics were fixed just a couple of
lines above the panicing code.

Take a more general approach: sanity check sockaddrs supplied
with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the
function and drop all checks below.

One check is now disabled due to strange code in ifconfig(8)
that I've removed recently. I'm going to enable it with next
__FreeBSD_version bump.

Historically in_ifinit() was able to recover from an error
and restore old address. Nowadays this feature isn't working
for all error cases, but for some of them. I suppose no software
relies on this behavior, so I'd like to remove it, since this
simplifies code a lot.

Also, move if_scrub() earlier in the in_ifinit(). It is more
correct to wipe routes before removing address from local
address list, and interface address list.

Silence from: bz, brooks, andre, rwatson, 3 weeks


# 226713 25-Oct-2011 qingli

Exclude host routes when checking for prefix coverage on multiple
interfaces. A host route has a NULL mask so check for that condition.
I have also been told by developers who customize the packet output
path with direct manipulation of the route entry (or the outgoing
interface to be specific). This patch checks for the route mask
explicitly to make sure custom code will not panic.

PR: kern/161805
MFC after: 3 days


# 226402 15-Oct-2011 glebius

Add support for IPv4 /31 prefixes, as described in RFC3021.

To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.


# 226401 15-Oct-2011 glebius

Remove last remnants of classful addressing:

- Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr.
- Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011.
- fix bug when we were not forwarding to a host which matches classful
net address. For example router having 192.168.x.y/16 network attached,
would not forward traffic to 192.168.*.0, which are legal IPs in
CIDR world.
- For compatibility, leave autoguessing of mask based on class.

Reviewed by: andre, bz, rwatson


# 226339 13-Oct-2011 glebius

De-spl(9).


# 226224 10-Oct-2011 qingli

All indirect routes will fail the rtcheck, except for a special host
route where the destination IP and the gateway IP is the same. This
special case handling is only meant for backward compatibility reason.
The last commit introduced a bug in the route check logic, where a
valid special case is treated as an error. This patch fixes that bug
along with some code cleanup.

Suggested by: gleb
Reviewed by: kmacy, discussed with gleb
MFC after: 1 day


# 226120 07-Oct-2011 qingli

Do not try removing an ARP entry associated with a given interface
address if that interface does not support ARP. Otherwise the
system will generate error messages unnecessarily due to the missing
entry.

PR: kern/159602
Submitted by: pluknet
MFC after: 3 days


# 226114 07-Oct-2011 qingli

Remove the reference held on the loopback route when the interface
address is being deleted. Only the last reference holder deletes the
loopback route. All other delete operations just clear the IFA_RTSELF
flag.

PR: kern/159601
Submitted by: pluknet
Reviewed by: discussed on net@
MFC after: 3 days


# 225947 03-Oct-2011 qingli

A system may have multiple physical interfaces, all of which are on the
same prefix. Since a single route entry is installed for the prefix
(without RADIX_MPATH), incoming packets on the interfaces that are not
associated with the prefix route may trigger an error message about
unable to allocation LLE entry, and fails L2. This patch makes sure a
valid route is present in the system, and allow the aforementioned
condition to exist and treats as valid.

Reviewed by: bz
MFC after: 5 days


# 225946 03-Oct-2011 qingli

This patch allows ARP to work properly in the presence of
self-referencing routes. This patch is a rework of r223862.

Reviewed by: bz, zec
MFC after: 5 days


# 225223 27-Aug-2011 qingli

When an interface address route is removed from the system, another
route with the same prefix is searched for as a replacement. The
current code did not bypass routes that have non-operational
interfaces. This patch fixes that bug and will find a replacement
route with an active interface.

PR: kern/159603
Submitted by: pluknet, ambrisko at ambrisko dot com
Reviewed by: discussed on net@
Approved by: re (bz)
MFC after: 3 days


# 224747 10-Aug-2011 kevlo

If RTF_HOST flag is specified, then we are interested in destination
address.

PR: kern/159600
Submitted by: Svatopluk Kraus <onwahe at gmail dot com>
Approved by: re (hrs)


# 223862 08-Jul-2011 zec

Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address. This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after: 3 days


# 222438 29-May-2011 qingli

Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface
address so that proper clean up will take place in the routing code.
This patch fixes the bootp panic on startup problem. Also, added more
error handling and logging code in function in_scrubprefix().

MFC after: 5 days


# 222143 20-May-2011 qingli

The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by: delphij
MFC after: 5 days


# 219828 21-Mar-2011 pluknet

Reference ifaddr object before unlocking as it can be freed
from another context at the moment of later access.

PR: kern/155555
Submitted by: Andrew Boyer <aboyer att averesystems.com>
Approved by: avg (mentor)
MFC after: 2 weeks


# 216075 30-Nov-2010 glebius

Use time_uptime instead of non-monotonic time_second to drive ARP
timeouts.

Suggested by: bde


# 215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# 215207 12-Nov-2010 gnn

Add a queue to hold packets while we await an ARP reply.

When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by: bz, rpaulo
MFC after: 3 weeks


# 213932 16-Oct-2010 bz

MfP4 CH182763 (original version):

Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.

(*) It is believed that with the current code and locking strategy we
cannot completely fix all race.

Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by: Nima Misaghian (nima_misa hotmail.com) (original version)
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version)
MFC after: 1 week


# 212209 04-Sep-2010 bz

In case of RADIX_MPATH do not leak the IN_IFADDR read lock on
early return.

MFC after: 3 days


# 211157 10-Aug-2010 will

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# 208553 25-May-2010 qingli

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days


# 207369 29-Apr-2010 bz

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# 206481 11-Apr-2010 bz

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


# 204902 08-Mar-2010 qingli

One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after: 5 days


# 203401 02-Feb-2010 qingli

Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.

Verified by: avg
MFC after: 5 days


# 201811 08-Jan-2010 qingli

Ensure an address is removed from the interface address
list when the installation of that address fails.

PR: 139559


# 201285 30-Dec-2009 qingli

Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.

MFC after: 5 days


# 201282 30-Dec-2009 qingli

The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days


# 198418 23-Oct-2009 qingli

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru
MFC after: 3 days


# 198111 15-Oct-2009 qingli

This patch fixes the following issues in the ARP operation:

1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.

Reviewed by: kmacy
MFC after: 3 days


# 197696 01-Oct-2009 qingli

Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

MFC after: immediately


# 197695 01-Oct-2009 qingli

Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR: kern/139113
MFC after: 3 days


# 197227 15-Sep-2009 qingli

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


# 197210 14-Sep-2009 qingli

The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

Reviewed by: kmacy


# 196995 08-Sep-2009 np

Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by: gnn, kmacy
Approved by: gnn (mentor)
MFC after: 1 month


# 196714 31-Aug-2009 qingli

This patch fixes the following issues:

- Routing messages are not generated when adding and removing
interface address aliases.
- Loopback route installed for an interface address alias is
not deleted from the routing table when that address alias
is removed from the associated interface.
- Function in_ifscrub() is called extraneously.

Reviewed by: gnn, kmacy, sam
MFC after: 3 days


# 196535 25-Aug-2009 rwatson

Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy
MFC after: 3 days


# 196481 23-Aug-2009 rwatson

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 195914 27-Jul-2009 qingli

This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re


# 195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 194951 25-Jun-2009 rwatson

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


# 194760 23-Jun-2009 rwatson

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# 194602 21-Jun-2009 rwatson

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


# 193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# 192612 22-May-2009 bz

If including vnet.h one has to include opt_route.h as well. This is
because struct vnet_net holds the rt_tables[][] for MRT and array size
is compile time dependent. If you had ROUTETABLES set to >1 after
r192011 V_loif was pointing into nonsense leading to strange results
or even panics for some people.

Reviewed by: mz


# 192476 20-May-2009 qingli

When an interface address is removed and the last prefix
route is also being deleted, the link-layer address table
(arp or nd6) will flush those L2 llinfo entries that match
the removed prefix.

Reviewed by: kmacy


# 192262 17-May-2009 bz

Unbreak options VIMAGE builds, in a followup to r192011 which did not
introduce INIT_VNET_NET() initializers necessary for accessing V_loif.

Submitted by: zec
Reviewed by: julian


# 192085 14-May-2009 qingli

Ignore the INADDR_ANY address inserted/deleted by DHCP when installing a loopback route
to the interface address.


# 192011 12-May-2009 qingli

This patch adds a host route to an interface address (that is assigned
to a non loopback/ppp link types) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route is implicitly added by the L2
code during RTM_RESOLVE of that interface address. This host route is
deleted when that interface is removed.

Reviewed by: kmacy


# 191548 26-Apr-2009 zec

In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)


# 191500 25-Apr-2009 rwatson

Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial
lookup of 'ia' from if_addrhead through most use. Note that we
currently have to drop it prematurely in some cases due to calls out to
the routing and interface code while using 'ia', but this closes many
races. Annotate several potential races that persist after this change.
Move to using M_NOWAIT for allocating new interface addresses due to
lock(s) being held.

MFC after: 3 weeks


# 191476 24-Apr-2009 rwatson

In in_purgemaddrs(), remove the inm being freed from the address list
before freeing it, rather than vice version, to avoid potential use
after free.

Reviewed by: bms


# 191456 24-Apr-2009 rwatson

Relocate permissions checking code in in_control() to before the body
of the implementation of ioctls. This makes the mapping of ioctls to
specific privileges more explicit, and also simplifies the
implementation by reducing the use of FALLTHROUGH handling in switch.

While this is not intended to be a functional change, it does mean
that certain privilege checks are now performed earlier, so EPERM
might be returned in preference to EADDRNOTAVAIL for management
ioctls that could have failed for both reasons.

MFC after: 3 weeks


# 191443 23-Apr-2009 rwatson

Reorganize in_control() so that invariants are more obvious, and so
that it is easier to lock:

- Handle the unsupported ioctl case at the beginning of in_control(),
handing off to ifp->if_ioctl, rather than looking up interfaces and
addresses unnecessarily in this case.

- Make it an invariant that ifp is always non-NULL when running
in_control()-implemented ioctls, simplifying the code structure.

MFC after: 3 weeks


# 191285 19-Apr-2009 rwatson

Protect against some writer-writer races in in_control() by acquiring
the interface address list lock around interface address list
modifications. More to do here.

MFC after: 2 weeks


# 189931 17-Mar-2009 bms

Deal with the case where ifma_protospec may be NULL, during
any IPv4 multicast operations which reference it.

There is a potential race because ifma_protospec is set to NULL
when we discover the underlying ifnet has gone away. This write
is not covered by the IF_ADDR_LOCK, and it's difficult to widen
its scope without making it a recursive lock. It isn't clear why
this manifests more quickly with 802.11 interfaces, but does not
seem to manifest at all with wired interfaces.

With this change, the 802.11 related panics reported by sam@
and cokane@ should go away. It is not the right fix, that requires
more thought before 8.0.

Idea from: sam
Tested by: cokane


# 189851 15-Mar-2009 rwatson

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


# 189603 09-Mar-2009 bms

Fix uninitialized use of ifp for ii.

Found by: Peter Holm


# 189592 09-Mar-2009 bms

Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.


# 188144 05-Feb-2009 jamie

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


# 187380 18-Jan-2009 sam

remove too noisy DIAGNOSTIC code

Reviewed by: qingli


# 186980 09-Jan-2009 bz

Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR: kern/68189
Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with: rwatson [2]
Reviewed by: rwatson
MFC after: 4 weeks


# 186948 09-Jan-2009 bz

Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related
jail-aware. Up to now we returned the first address of the interface
for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for
programs querying for an address but running inside a jail, as the
address returned usually did not belong to the jail.
Like for v6, if there was an ifr_addr given on v4, you could probe
for more addresses on the interfaces that you were not allowed to see
from inside a jail. Return an error (EADDRNOTAVAIL) in that case
now unless the address is on the given interface and valid for the
jail.

PR: kern/114325
Reviewed by: rwatson
MFC after: 4 weeks


# 186935 09-Jan-2009 harti

Set a minimum of information in the routing message (like version and type)
so that generic routing message parsing code can parse the messages for
L2 info that are retrieved via the sysctl interface.


# 186708 02-Jan-2009 qingli

Some modules such as SCTP supplies a valid route entry as an input argument
to ip_output(). The destionation is represented in a sockaddr{} object
that may contain other pieces of information, e.g., port number. This
same destination sockaddr{} object may be passed into L2 code, which
could be used to create a L2 entry. Since there exists a L2 table per
address family, the L2 lookup function can make address family specific
comparison instead of the generic bcmp() operation over the entire
sockaddr{} structure.

Note in the IPv6 case the sin6_scope_id is not compared because the
address is currently stored in the embedded form inside the kernel.
The in6_lltable_lookup() has to account for the scope-id if this
storage format were to change in the future.


# 186544 28-Dec-2008 bz

For consistency use LLE_IS_VALID() in this 4th place that is actually
interested in the (void *)-1 return value hack.
This way we can easily identify those special parts of the code.


# 186500 26-Dec-2008 qingli

This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
option "-iface". During if_attach(), an sockaddr_dl{} entry is created
for the interface and is part of the interface address list. This
sockaddr_dl{} entry describes the interface in detail. The "route"
command selects this entry as the "gateway" object when the "-iface"
option is present. The "arp" and "ndp" commands also interact with the
kernel through the routing socket when adding and removing static L2
entries. The static L2 information is also provided through the
"gateway" object with an AF_LINK family type, similar to what is
provided by the "route" command. In order to differentiate between
these two types of operations, a RTF_LLDATA flag is introduced. This
flag is set by the "arp" and "ndp" commands when issuing the add and
delete commands. This flag is also set in each L2 entry returned by the
kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
in the fields for a "rtm" object, which is reinjected into the kernel by
a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
is a prefix route, so the RTF_LLDATA flag must be specified when issuing
the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
specification for retrieving L2 information. Also optimized the
code logic.

Reviewed by: julian


# 186150 15-Dec-2008 kmacy

unlock and destroy an llentry's lock before freeing

Found by: sam


# 186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 185419 28-Nov-2008 zec

Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 185088 19-Nov-2008 zec

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 184295 26-Oct-2008 bz

Style changes only:
- Consistently add parentheses to return statements.
- Use NULL instead of 0 when comparing pointers, also avoiding
unnecessary casts.
- Do not use pointers as booleans.

Reviewed by: rwatson (earlier version)
MFC after: 2 months


# 183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 179971 24-Jun-2008 gonzo

In case of interface initialization failure remove struct in_ifaddr* from
in_ifaddrhashtbl in in_ifinit because error handler in in_control removes
entries only for AF_INET addresses. If in_ifinit is called for the cloned
inteface that has just been created its address family is not AF_INET and
therefor LIST_REMOVE is not called for respective LIST_INSERT_HEAD and
freed entries remain in in_ifaddrhashtbl and lead to memory corruption.

PR: kern/124384


# 175626 24-Jan-2008 bz

Differentiate between addifaddr and delifaddr for the privilege check.

Reviewed by: rwatson
MFC after: 2 weeks


# 172467 07-Oct-2007 silby

Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by: re (kensmith)


# 170855 16-Jun-2007 mjacob

Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr
to nonzero you fulfill the same function as the variable 'cmp'. so you
might as well zero match and test against it later.

Reviewed by: timeout on review request


# 170613 12-Jun-2007 bms

Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)


# 169454 10-May-2007 rwatson

Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.


# 168032 29-Mar-2007 bms

Fix a bug in IPv4 address configuration exposed by refcounting.
* Join the IPv4 all-hosts multicast group 224.0.0.1 once only;
that is, when an IPv4 address is first configured on an interface.
* Do not join it for subsequent IPv4 addresses as this violates IGMP.
* Be sure to leave the group when all IPv4 addresses have been removed
from the interface.
* Add two DIAGNOSTIC printfs related to the issue.

Further care and attention is needed in this area; it is suggested that
netinet's attachment to the ifnet structure be compartmentalized and
non-implicit.

Bug found by: andre
MFC after: 1 month


# 167729 19-Mar-2007 bms

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


# 166450 03-Feb-2007 bms

In regular forwarding path, reject packets destined for 169.254.0.0/16
link-local addresses. See RFC 3927 section 2.7.


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 162718 28-Sep-2006 bms

The IPv4 code should clean up multicast group state when an interface
goes away. Without this change, it leaks in_multi (and often ether_multi
state) if many clonable interfaces are created and destroyed in quick
succession.

The concept of this fix is borrowed from KAME. Detailed information about
this behaviour, as well as test cases, are available in the PR.

PR: kern/78227
MFC after: 1 week


# 154777 24-Jan-2006 andre

In in_control() remove the temporary in_ifaddr structure from the
ia_hash only if it actually is an AF_INET address. All other places
test for sa_family == AF_INET but this one.

PR: kern/92091
Submitted by: Seth Kingsley <sethk-at-meowfishies.com>
MFC after: 3 days


# 151824 28-Oct-2005 glebius

First fill in structure with valid values, and only then attach it
to the global list.

Reviewed by: rwatson


# 151555 22-Oct-2005 glebius

In in_addprefix() compare not only route addresses, but their masks,
too. This fixes problem when connected prefixes overlap.

Obtained from: OpenBSD (rev. 1.40 by claudio);
[ I came to this fix myself, and then found out that
OpenBSD had already fixed it the same way.]


# 150853 03-Oct-2005 rwatson

Unlock Giant symmetrically with respect to lock acquire order as that's
generally nicer.

Spotted by: johan
MFC after: 1 week


# 150852 03-Oct-2005 rwatson

Acquire Giant conditionally in in_addmulti() and in_delmulti() based on
whether the interface being accessed is IFF_NEEDSGIANT or not. This
avoids lock order reversals when calling into the interface ioctl
handler, which could potentially lead to deadlock.

The long term solution is to eliminate non-MPSAFE network drivers.

Discussed with: jhb
MFC after: 1 week


# 150296 18-Sep-2005 rwatson

Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after: 3 days


# 149221 18-Aug-2005 glebius

In order to support CARP interfaces kernel was taught to handle more
than one interface in one subnet. However, some userland apps rely on
the believe that this configuration is impossible.

Add a sysctl switch net.inet.ip.same_prefix_carp_only. If the switch
is on, then kernel will refuse to add an additional interface to
already connected subnet unless the interface is CARP. Default
value is off.

PR: bin/82306
In collaboration with: mlaier


# 148682 03-Aug-2005 rwatson

Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros. For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 10 days


# 146883 01-Jun-2005 iedowse

Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.


# 143881 20-Mar-2005 glebius

ifma_protospec is a pointer. Use NULL when assigning or compating it.


# 143868 20-Mar-2005 glebius

Remove a workaround from previos revision. It proved to be incorrect.
Add two another workarounds for carp(4) interfaces:
- do not add connected route when address is assigned to carp(4) interface
- do not add connected route when other interface goes down

Embrace workarounds with #ifdef DEV_CARP


# 143374 10-Mar-2005 glebius

Add antifootshooting workaround, which will make all routes "connected"
to carp(4) interfaces host routes. This prevents a problem, when connected
network is routed to carp(4) interface.


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 137833 17-Nov-2004 mlaier

Fix host route addition for more than one address to a loopback interface
after allowing more than one address with the same prefix.

Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru>
Submitted by: ru (also NetBSD rev. 1.83)
Pointyhat to: mlaier


# 137668 13-Nov-2004 mlaier

Merge copyright notices.

Requested by: njl


# 137628 12-Nov-2004 mlaier

Change the way we automatically add prefix routes when adding a new address.
This makes it possible to have more than one address with the same prefix.
The first address added is used for the route. On deletion of an address
with IFA_ROUTE set, we try to find a "fallback" address and hand over the
route if possible.
I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to
in_ifscrub as it must be considered KAPI as it is not static in in.c. I will
clean this after the MFC.

Discussed on: arch, net
Tested by: many testers of the CARP patches
Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it>
Obtained from: WIDE via OpenBSD
MFC after: 1 month


# 133874 16-Aug-2004 rwatson

White space cleanup for netinet before branch:

- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by: re (scottl)
Submitted by: Xin LI <delphij@frontfree.net>


# 133486 11-Aug-2004 andre

Add the function in_localip() which returns 1 if an internet address is for
the local host and configured on one of its interfaces.


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 126264 26-Feb-2004 mlaier

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


# 123998 30-Dec-2003 ru

Document the net.inet.ip.subnets_are_local sysctl.


# 121922 03-Nov-2003 sam

Correct rev 1.56 which (incorrectly) reversed the test used to
decide if in_pcbpurgeif0 should be invoked.

Supported by: FreeBSD Foundation


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 108033 18-Dec-2002 hsu

Lock up ifaddr reference counts.


# 107983 17-Dec-2002 phk

Remove unused and incorrectly maintained variable "in_interfaces"


# 105748 22-Oct-2002 suz

fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4"
MFC after: 1 week


# 98102 10-Jun-2002 hsu

Lock up inpcb.

Submitted by: Jennifer Yang <yangjihui@yahoo.com>


# 94327 09-Apr-2002 brian

Remove the code that masks an EEXIST returned from rtinit() when
calling ioctl(SIOC[AS]IFADDR).

This allows the following:

ifconfig xx0 inet 1.2.3.1 netmask 0xffffff00
ifconfig xx0 inet 1.2.3.17 netmask 0xfffffff0 alias
ifconfig xx0 inet 1.2.3.25 netmask 0xfffffff8 alias
ifconfig xx0 inet 1.2.3.26 netmask 0xffffffff alias

but would (given the above) reject this:

ifconfig xx0 inet 1.2.3.27 netmask 0xfffffff8 alias

due to the conflicting netmasks. I would assert that it's wrong
to mask the EEXIST returned from rtinit() as in the above scenario, the
deletion of the 1.2.3.25 address will leave the 1.2.3.27 address
as unroutable as it was in the first place.

Offered for review on: -arch, -net
Discussed with: stephen macmanus <stephenm@bayarea.net>
MFC after: 3 weeks


# 94326 09-Apr-2002 brian

Don't add host routes for interface addresses of 0.0.0.0/8 -> 0.255.255.255.

This change allows bootp to work with more than one interface, at the
expense of some rather ``wrong'' looking code. I plan to MFC this in
place of luigi's recent #ifdef BOOTP stuff that was committed to this
file in -stable, as that's slightly more wrong that this is.

Offered for review on: -arch, -net
MFC after: 2 weeks


# 93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 92723 19-Mar-2002 alfred

Remove __P.


# 87124 30-Nov-2001 brian

During SIOCAIFADDR, if in_ifinit() fails and we've already added an
interface address, blow the address away again before returning the
error.

In in_ifinit(), if we get an error from rtinit() and we've also got
a destination address, return the error rather than masking EEXISTS.
Failing to create a host route when configuring an interface should
be treated as an error.


# 85740 30-Oct-2001 des

Make sure the netmask always has an address family. This fixes Linux
ifconfig, which expects the address returned by the SIOCGIFNETMASK ioctl
to have a valid sa_family. Similar changes may be necessary for IPv6.

While we're here, get rid of an unnecessary temp variable.

MFC after: 2 weeks


# 84317 01-Oct-2001 jlemon

in_ifinit apparently can be used to rewrite an ip address; recalculate
the correct hash bucket for the entry.

Submitted by: iedowse (with some munging by me)


# 84102 29-Sep-2001 jlemon

Add a hash table that contains the list of internet addresses, and use
this in place of the in_ifaddr list when appropriate. This improves
performance on hosts which have a large number of IP aliases.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83130 06-Sep-2001 jlemon

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# 81127 04-Aug-2001 ume

When running aplication joined multicast address,
removing network card, and kill aplication.
imo_membership[].inm_ifp refer interface pointer
after removing interface.
When kill aplication, release socket,and imo_membership.
imo_membership use already not exist interface pointer.
Then, kernel panic.

PR: 29345
Submitted by: Inoue Yuichi <inoue@nd.net.fujitsu.co.jp>
Obtained from: KAME
MFC after: 3 days


# 78064 11-Jun-2001 ume

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 76469 11-May-2001 ru

In in_ifadown(), differentiate between whether the interface goes
down or interface address is deleted. Only delete static routes
in the latter case.

Reported by: Alexander Leidinger <Alexander@leidinger.net>


# 74362 16-Mar-2001 phk

<sys/queue.h> makeover.


# 74299 15-Mar-2001 ru

net/route.c:

A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.

Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.

This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.

netinet/in.c, netinet/in_rmx.c:

When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)

Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.

PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)


# 72012 04-Feb-2001 phk

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


# 71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


# 69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# 67893 29-Oct-2000 phk

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


# 64853 19-Aug-2000 bde

Fixed a missing splx() in if_addmulti(). Was broken in rev.1.28.


# 62587 04-Jul-2000 itojun

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


# 55917 13-Jan-2000 shin

Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

s/__ss_family/ss_family/
s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.


# 55009 22-Dec-1999 shin

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 45997 24-Apr-1999 luigi

postpone the sending of IGMP LEAVE msg to after deleting the
mc address from the address list. The latter operation on some
hardware resets the card, potentially canceling the pending LEAVE
pkt.


# 41575 07-Dec-1998 eivind

Clean up some pointer usage.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 27845 02-Aug-1997 bde

Removed unused #includes.


# 25201 27-Apr-1997 wollman

The long-awaited mega-massive-network-code- cleanup. Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.


# 24204 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 2: include
<sys/sockio.h> instead of <sys/ioctl.h> in network files.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22900 18-Feb-1997 wollman

Convert raw IP from mondo-switch-statement-from-Hell to
pr_usrreqs. Collapse duplicates with udp_usrreq.c and
tcp_usrreq.c (calling the generic routines in uipc_socket2.c and
in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached
socket now traps, rather than harmlessly returning an error; this
should never happen. Allow the raw IP buffer sizes to be
controlled via sysctl.


# 22672 13-Feb-1997 wollman

Provide PRC_IFDOWN and PRC_IFUP support for IP. Now, when an interface
is administratively downed, all routes to that interface (including the
interface route itself) which are not static will be deleted. When
it comes back up, and addresses remaining will have their interface routes
re-added. This solves the problem where, for example, an Ethernet interface
is downed by traffic continues to flow by way of ARP entries.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21666 13-Jan-1997 wollman

Use the new if_multiaddrs list for multicast addresses rather than the
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).


# 20532 15-Dec-1996 wollman

Some days, it just doesn't pay to get out of bed. Fix another broken
reference to the now-dead-for-real-this-time ia_next field.

Reminded by: Russell Vincent


# 20407 13-Dec-1996 wollman

Convert the interface address and IP interface address structures
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).


# 18193 09-Sep-1996 wollman

Set subnetsarelocal to false. In a classless world, the other case
is almost never useful. (This is only a quick hack; someone should
go back and delete the entire subnetsarelocal==1 code path.)


# 15092 07-Apr-1996 dg

Added proper splnet protection while modifying the interface address list.
This fixes a panic that occurs when ifconfig ioctl(s) were interrupted
by IP traffic at the wrong time - resulting in a NULL pointer dereference.
This was originally noticed on a FreeBSD 1.0 system, but the problem still
exists in current sources.


# 14632 15-Mar-1996 fenner

Allow SIOCGIFBRDADDR and SIOCGIFNETMASK to return information about
aliases, if the alias address was passed in the struct ifreq.
Default to first address on the list, for backwards compatibility.


# 14546 11-Mar-1996 dg

Move or add #include <queue.h> in preparation for upcoming struct socket
changes.


# 13351 08-Jan-1996 guido

Fix a bug where having a process listening to both a INADDR_ANY and a
local address, that was assigned with ifconfig alias and netmask
0xffffffff, would receive duplictae udp packets.
This behaviour can easily be seen by having named run, and using the alias
address as the name server.
This solution is not the pretiest one, but after talk with Garreth, it
is seen as the most easy one.


# 12704 09-Dec-1995 phk

Staticize.


# 12426 20-Nov-1995 phk

fix #includes & warnings.


# 12296 14-Nov-1995 phk

New style sysctl & staticize alot of stuff.


# 11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


# 10939 21-Sep-1995 wollman

Merge with 4.4-Lite-2. This is actually a 64-bit fix; the second parameter
to in_control() is sometimes a pointer, and sometimes an integer, so use
u_long rather than int.

Obtained from: 4.4BSD-Lite-2


# 9563 17-Jul-1995 wollman

Return EDESTADDRREQ rather than EADDRNOTAVAIL if the user attempts to
half-configure a point-to-point interface.

Submitted by: Jonathan M. Bresler <jmb@kryten.atinc.com>


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8090 26-Apr-1995 pst

Cleanup loopback interface support.
Reviewed by: wollman


# 8071 25-Apr-1995 wollman

Disallow half-configured point-to-point interfaces. It's still possible to
get into a half-configured state by using the old-style ioctls;this
may be a feature.


# 7280 23-Mar-1995 wollman

in_var.h: in_multi structures now form a queue(3)-style LIST structure
in.c: when an interface address is deleted, keep its multicast membership
. records (attached to a struct multi_kludge) for attachment to the
. next address on the same interface. Also, in_multi structures now
. gain a reference to the ifaddr so that they won't point off into
. freed memory if an interface goes away and doesn't come back before
. the last socket reference drops. This is analogous to how it is
. done for routes, and seems to make the most sense.


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 6363 14-Feb-1995 phk

YFfix.


# 5195 22-Dec-1994 wollman

Move ARP interface initialization into if_ether.c:arp_ifinit().


# 4127 03-Nov-1994 wollman

Fix off-by-one error reported to NetBSD by Karl Fox in
<9411031449.AA11102@gefilte.MorningStar.Com>.


# 3311 02-Oct-1994 phk

GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:


# 2822 16-Sep-1994 phk

Made the kernel compile even without "ether".


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources