History log of /freebsd-10.1-release/sys/net/bpf.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 250945 23-May-2013 ghelmer

While waiting for the bpf hold buffer to become idle, check
the return value from mtx_sleep() and exit bpfread() on
errors such as EINTR.

Reviewed by: jhb


# 248207 12-Mar-2013 glebius

Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.


# 245878 24-Jan-2013 glebius

- Utilize m_get2(), accidentially fixing some signedness bugs.
- Return EMSGSIZE in both cases if uio_resid is oversized or undersized.
- No need to clear rcvif.


# 244090 10-Dec-2012 ghelmer

Changes to resolve races in bpfread() and catchpacket() that, at worst,
cause kernel panics.

Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.


# 243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 243799 02-Dec-2012 melifaro

Fix bpf_if structure leak introduced in r235745.
Move all such structures to delayed-free lists and
delete all matching on interface departure event.

MFC after: 1 week


# 242673 06-Nov-2012 ghelmer

Work around a race in bpfread() by validating the hold buffer pointer
before freeing it. Otherwise, we can lose a buffer and cause a panic
in catchpacket().


# 236806 09-Jun-2012 melifaro

Fix typo introduced in r236559.

Pointed by: bcr
Approved by: kib(mentor)


# 236559 04-Jun-2012 melifaro

Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by: avg
Approved by: ae(mentor)
Tested by: avg
MFC after: 1 week


# 236262 29-May-2012 jkim

Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf().


# 236261 29-May-2012 jkim

- Save the previous filter right before we set new one.
- Reduce duplicate code and make it little easier to read.

MFC after: 2 weeks


# 236251 29-May-2012 jkim

Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor
and reset statistics as it should.

MFC after: 3 days


# 236231 29-May-2012 melifaro

Fix BPF_JITTER code broken by r235746.

Pointed by: jkim
Reviewed by: jkim (except locking changes)
Approved by: (mentor)
MFC after: 2 weeks


# 235747 21-May-2012 melifaro

Make most BPF ioctls() SMP-safe.

Approved by: kib(mentor)
MFC in: 4 weeks


# 235746 21-May-2012 melifaro

Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.

Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.

Approved by: kib(mentor)
MFC in: 4 weeks


# 235745 21-May-2012 melifaro

Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by: kib(mentor)
MFC in: 4 weeks


# 235744 21-May-2012 melifaro

Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by: kib(mentor)
Tested by: gnn
MFC in: 4 weeks


# 233946 06-Apr-2012 melifaro

Fix build broken by r233938.

Pointed by: David Wolfskill <david@catwhisker.org>
Approved by: kib (mentor)
Pointy hat to: melifaro


# 233938 06-Apr-2012 melifaro

- Improve performace for writer-only BPF users.

Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by: glebius (previous version)
Reviewed by: silence on -net@
Approved by: (mentor)

MFC after: 4 weeks


# 233937 06-Apr-2012 melifaro

- Improve BPF locking model.

Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by: Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by: Yandex LLC

Reviewed by: glebius (previous version)
Reviewed by: silence on -net@
Approved by: (mentor)

MFC after: 3 weeks


# 232449 03-Mar-2012 jmallett

o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI. This mostly follows nwhitehorn's lead in implementing
COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit
ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
with 32-bit compatibility, then, disable some code which assumes otherwise
wrongly when built for MIPS. A more general macro to check in this case would
seem like a good idea eventually. If someone adds support for using n32
userland with n64 kernels on MIPS, then they will have to add a variety of
flags related to each piece of the ABI that can vary. That's probably the
right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
freebsd32 compat code. Probably this should be generalized at some point.

Reviewed by: gonzo


# 229898 09-Jan-2012 lstewart

Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.

Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.

Whilst here, also:

- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
exist in the list with a NULL ifnet pointer.

- Except when INVARIANTS is in the kernel config, silently ignore the case where
no bpf_if referencing the specified ifnet is found, as it is harmless and does
not require user attention.

Reviewed by: csjp
MFC after: 1 week


# 229073 31-Dec-2011 lstewart

Revert r228986 until it can be reworked to avoid panicing the kernel when the
same interface is attached multiple times with different DLTs, as is done in
net80211 for example.

Reported by: adrian


# 228986 30-Dec-2011 lstewart

- Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one
aspect of time stamp configuration per interface rather than per BPF
descriptor. Prior to this, the order in which BPF devices were opened and the
per descriptor time stamp configuration settings could cause non-deterministic
and unintended behaviour with respect to time stamping. With the new scheme, a
BPF attached interface's tscfg sysctl entry can be set to "default", "none",
"fast", "normal" or "external". Setting "default" means use the system default
option (set with the net.bpf.tscfg.default sysctl), "none" means do not
generate time stamps for tapped packets, "fast" means generate time stamps for
tapped packets using a hz granularity system clock read, "normal" means
generate time stamps for tapped packets using a full timecounter granularity
system clock read and "external" (currently unimplemented) means use the time
stamp provided with the packet from an underlying source.

- Utilise the recently introduced sysclock_getsnapshot() and
sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
packet, regardless of the number of BPF descriptors and time stamp formats
requested. Use the per BPF attached interface time stamp configuration to
control if sysclock_getsnapshot() is called and whether the system clock read
is fast or normal. The per BPF descriptor time stamp configuration is then
used to control how the system clock snapshot is converted to a bintime by
sysclock_snap2bintime().

- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
read of the system clock is now controlled per BPF attached interface using
the net.bpf.tscfg sysctl tree.

- Update the bpf.4 man page.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

In collaboration with: Julien Ridoux (jridoux at unimelb edu au)


# 228132 29-Nov-2011 lstewart

Revert r227778 in preparation for committing reworked patches in its place.


# 227778 21-Nov-2011 lstewart

- When feed-forward clock support is compiled in, change the BPF header to
contain both a regular timestamp obtained from the system clock and the
current feed-forward ffcounter value. This enables new possibilities including
comparison of timekeeping performance and timestamp correction during post
processing.

- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
packets using the feedback or feed-forward system clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 225177 25-Aug-2011 attilio

Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks


# 212757 16-Sep-2010 jkim

Fix a typo in a comment.

Submitted by: afiveg


# 209216 15-Jun-2010 jkim

Implement flexible BPF timestamping framework.

- Allow setting format, resolution and accuracy of BPF time stamps per
listener. Previously, we were only able to use microtime(9). Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command. Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts. bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp. The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries. On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long. On 32-bit platforms, the size may
increase by 8 bytes. For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested. However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by: net@


# 207278 27-Apr-2010 bz

MFP4: @177254

Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 4 days


# 207195 25-Apr-2010 kib

Provide compat32 shims for bpf(4), except zero-copy facilities.

bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.

Submitted by: jhb (version for 6.x)
Reviewed and tested by: emaste
MFC after: 2 weeks


# 205858 29-Mar-2010 jkim

Check the pointer to JIT binary filter before its de-allocation.

Submitted by: Alexander Sack (asack at niksun dot com)
MFC after: 3 days


# 205095 12-Mar-2010 jkim

Fix a style(9) nit.


# 205092 12-Mar-2010 jkim

Tidy up callout for select(2) and read timeout.

- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.

PR: kern/144453
Submitted by: Alexander Sack (asack at niksun dot com)[1]
MFC after: 1 week


# 204105 19-Feb-2010 jkim

Return partially filled buffer for non-blocking read(2)
in non-immediate mode.

PR: kern/143855


# 198417 23-Oct-2009 rwatson

Remove unneeded blank line from bpf_drvinit().

MFC after: 3 days


# 197134 12-Sep-2009 rwatson

Use C99 initialization for struct filterops.

Obtained from: Mac OS X
Sponsored by: Apple Inc.
MFC after: 3 weeks


# 196150 12-Aug-2009 jkim

Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.

Reviewed by: rwatson
Approved by: re (rwatson)


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 194512 19-Jun-2009 csjp

Implement the -z (zero counters) option for the various bpf counters.
Add necessary changes to the kernel for this (basically introduce a
bpf_zero_counters() function). As well, update the man page.

MFC after: 1 month
Discussed with: rwatson


# 194368 17-Jun-2009 bz

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


# 193951 10-Jun-2009 kib

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 192763 25-May-2009 sam

rev bpf attach/detach event api to include the dlt


# 192313 18-May-2009 sam

add bpf_track eventhandler for monitoring bpf taps attached/detached

Reviewed by: csjp


# 191816 05-May-2009 zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


# 189620 10-Mar-2009 csjp

Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup. pflogd(8)
is an example. The problem is understood and there is a fix coming in
shortly.

Folks who want to continue using it can do so by setting

net.bpf.zerocopy_enable

to 1.

Discussed with: rwatson


# 189501 07-Mar-2009 rwatson

When resetting a BPF descriptor, properly check that zero-copy buffers
are not currently owned by userspace before clearing or rotating them.

Otherwise we may not play by the rules of the shared memory protocol,
potentially corrupting packet data or causing userspace applications
that are playing by the rules to spin due to being notified that a
buffer is complete but the shared memory header not reflecting that.

This behavior was seen with pflogd by a number of reporters; note that
this fix is not sufficient to get pflogd properly working with
zero-copy BPF, due to pflogd opening the BPF device before forking,
leading to the shared memory buffer not being propery inherited in the
privilege-separated child. We're still deciding how to fix that
problem.

This change exposes buffer-model specific strategy information in
reset_d(), which will be fixed at a later date once we've decided how
best to improve the BPF buffer abstraction.

Reviewed by: csjp
Reported by: keramida


# 189490 07-Mar-2009 csjp

Mark the bpf stats sysctl as being mpsafe. We do not require
Giant here.


# 189286 02-Mar-2009 csjp

Switch the default buffer mode in bpf(4) to zero-copy buffers.

Discussed with: rwatson


# 185348 26-Nov-2008 zec

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# 182462 29-Aug-2008 jkim

Make bpf_maxinsns visible from ng_bpf.c.
Pass me the pointyhat, please.


# 181690 13-Aug-2008 ed

Change bpf(4) to use the cdevpriv API.

Right now the bpf(4) driver uses the cloning API to generate /dev/bpf%u.
When an application such as tcpdump needs a BPF, it opens /dev/bpf0,
/dev/bpf1, etc. until it opens the first available device node. We used
this approach, because our devfs implementation didn't allow
per-descriptor data.

Now that we can, make it use devfs_get_cdevpriv() to obtain the private
data. To remain compatible with the existing implementation, add a
symlink from /dev/bpf0 to /dev/bpf. I've already changed libpcap to
compile with HAVE_CLONING_BPF, which makes it use /dev/bpf. There may be
other applications in the base system (dhclient) that use the loop to
obtain a valid bpf.

Discussed on: src-committers
Approved by: csjp


# 181135 01-Aug-2008 csjp

Annotate why we do not call BPF_CHECK_DIRECTION() in this tapping routine.
There is no way for the caller to tell us which direction this packet is
going. With the bpf_mtap{2} routines, we can check the interface pointer.

MFC after: 2 weeks


# 180515 14-Jul-2008 jkim

Allow injecting big packets via bpf(4) up to min(MTU, 16K-byte).

MFC after: 1 week


# 180337 07-Jul-2008 dwmalone

Add a new ioctl for changing the read filter (BIOCSETFNR). This is
just like BIOCSETF but it doesn't drop all the packets buffered on
the discriptor and reset the statistics.

Also, when setting the write filter, don't drop packets waiting to
be read or reset the statistics.

PR: 118486
Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after: 1 month


# 180310 05-Jul-2008 csjp

Make sure we are clearing the ZBUF_FLAG_IMMUTABLE any time a free buffer
is reclaimed by the kernel. This fixes a bug resulted in the kernel
over writing packet data while user-space was still processing it when
zerocopy is enabled. (Or a panic if invariants was enabled).

Discussed with: rwatson


# 178882 09-May-2008 jhb

Set D_TRACKCLOSE to avoid a race in devfs that could lead to orphaned bpf
devices never getting fully closed.

MFC after: 3 days


# 178639 28-Apr-2008 jkim

Check packet directions more properly instead of just checking received
interface is null.

PR: kern/123138
Submitted by: Dmitry (hanabana at mail dot ru)
MFC after: 1 week


# 178223 15-Apr-2008 jkim

Revert the previous commit and use M_PROMISC flag instead.
It is safer because it will never be used for outgoing packets.


# 178208 14-Apr-2008 jkim

Remove M_SKIP_FIREWALL abuse and add more appropriate check.

Pointyhat to: jkim
Reported by: Eugene Grosbein (eugen at kuzbass dot ru)
MFC after: 3 days


# 177966 07-Apr-2008 rwatson

Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet. To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data. Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.

This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load. This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.

Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.


# 177599 25-Mar-2008 ru

Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by: arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.


# 177596 25-Mar-2008 rwatson

Check for a NULL free buffer pointer in BPF before invoking
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.

MFC after: 4 months


# 177548 24-Mar-2008 csjp

Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by: Seccuris Inc.
In collaboration with: rwatson
Tested by: pwood, gallatin
MFC after: 4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
that can break the monitoring ABI.


# 177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# 175903 02-Feb-2008 rwatson

Add comment that bpfread() has multi-threading issues.

Fix minor white space nit.


# 174895 25-Dec-2007 rwatson

Use __FBSDID() in the kernel BPF implementation.

MFC after: 3 days


# 174876 23-Dec-2007 rwatson

Remove trailing whitespace from lines in BPF.

MFC after: 3 days


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 172582 12-Oct-2007 csjp

Make sure that we refresh the PID on read(2) and write(2) operations.
This fixes the process portion of the bpf(4) stats if the peer forks
into the background after it's opened the descriptor. This bug
results in the following behavior for netstat -B:

# netstat -B
Pid Netif Flags Recv Drop Match Sblen Hblen Command
netstat: kern.proc.pid failed: No such process
78023 em0 p--s-- 2237404 43119 2237404 13986 0 ??????

MFC after: 1 week


# 172108 09-Sep-2007 thompsa

Check for multicast destination on bpf injected packets and update the M_*CAST
flags, the absense of these flags causes problems in other areas such as
bridging which expect them to be correct.

At the moment only Ethernet DLTs are checked.

Reviewed by: bms, csjp, sam
Approved by: re (bmah)


# 171744 06-Aug-2007 rwatson

Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.

Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)


# 171637 28-Jul-2007 rwatson

Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over: bz
Approved by: re (kensmith)


# 170896 17-Jun-2007 csjp

Silence some gcc 4 warnings. It is expected that the bpf_movein() routine
will intialize the the header length and re-initialize the mbuf pointer
to reference the mbuf that is allocated after moving user supplied packet
data in.


# 170749 15-Jun-2007 csjp

- Conditionally pickup Giant around the network interface
ioctl routines if we are running with !mpsafenet
- Change un-conditional Giant acquisition around ifpromisc
to occur only if we are running with !mpsafenet

With these locking bits in place, we can now remove the Giant
requirement from BPF, so drop the D_NEEDGIANT device flag.
This change removes Giant acquisitions around BPF device
handlers (read, write, ioctl etc).

MFC after: 1 month
Discussed with: rwatson


# 167035 26-Feb-2007 jkim

Add three new ioctl(2) commands for bpf(4).

- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF. Set to BPF_D_IN to see only incoming packets on the
interface. Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface. Set to BPF_D_OUT to see only outgoing
packets on the interface. This setting is initialized to BPF_D_INOUT
by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.

- BIOCFEEDBACK sets packet feedback mode. This allows injected packets
to be fed back as input to the interface when output via the interface is
successful. When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication. This flag is initialized to
zero by default.

Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.

Reviewed by: rwatson


# 166311 28-Jan-2007 rwatson

Remove slightly dubious comment; add descriptive strings for several
sysctls.

MFC after: 3 days


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 161124 09-Aug-2006 rwatson

Since bpf_allocbufs() uses malloc() with M_WAITOK, don't check return
values for NULL or return an error state. Assert that all three bpf
buffer pointers are NULL before starting.

MFC after: 1 week


# 160690 26-Jul-2006 sam

add support for 802.11 packet injection via bpf

Together with: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Reviewed by: arch@
MFC after: 1 month


# 160620 24-Jul-2006 dwmalone

Rather than calling mircotime() in catchpacket(), make catchpacket()
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.

This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.

It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.

(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)

PR: 71711


# 160087 03-Jul-2006 csjp

Adjust descriptor locking to tell the kqueue subsystem that our descriptor is
already locked. The reason to do this is to avoid two lock+unlock operations
in a row. We need the lock here to serialize access to bd_pid for stats
collection purposes.

Drop the locks all together on detach, as they will be picked up by
knlist_remove.

This should fix a failed locking assertion when kqueue is being used with bpf
descriptors.

Discussed with: jmg


# 159641 15-Jun-2006 csjp

Since we are doing some bpf(4) clean up, change a couple of function prototypes
to be consistent. Also, ANSI'fy function definitions. There is no functional
change here.


# 159595 14-Jun-2006 csjp

If bpf(4) has not been compiled into the kernel, initialize the bpf interface
pointer to a zeroed, statically allocated bpf_if structure. This way the
LIST_EMPTY() macro will always return true. This allows us to remove the
additional unconditional memory reference for each packet in the fast path.

Discussed with: sam


# 159180 02-Jun-2006 csjp

Fix the following bpf(4) race condition which can result in a panic:

(1) bpf peer attaches to interface netif0
(2) Packet is received by netif0
(3) ifp->if_bpf pointer is checked and handed off to bpf
(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
initialized to NULL.
(5) ifp->if_bpf is dereferenced by bpf machinery
(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
bpf interface structure. Once this is done, ifp->if_bpf should never be
NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
a lockless read bpf peer list associated with the interface. It should
be noted that the bpf code will pickup the bpf_interface lock before adding
or removing bpf peers. This should serialize the access to the bpf descriptor
list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

(1) Packet is received by netif0
(2) Check to see if bpf descriptor list is empty
(3) Pickup the bpf interface lock
(4) Hand packet off to process

From the attach/detach side:

(1) Pickup the bpf interface lock
(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
not we have any bpf peers that might be interested in receiving packets.

In collaboration with: sam@
MFC after: 1 month


# 159078 30-May-2006 ru

Fix -Wundef warnings.


# 158345 07-May-2006 csjp

Pickup locks for the BPF interface structure. It's quite possible that
bpf(4) descriptors can be added and removed on this interface while we
are processing stats.

MFC after: 2 weeks


# 153213 07-Dec-2005 jkim

Add BPF Just-In-Time compiler support for ng_bpf(4).

The sysctl is changed from net.bpf.jitter.enable to net.bpf_jitter.enable
and this controls both bpf(4) and ng_bpf(4) now.


# 153151 06-Dec-2005 jkim

Add experimental BPF Just-In-Time compiler for amd64 and i386.

Use the following kernel configuration option to enable:

options BPF_JITTER

If you want to use bpf_filter() instead (e. g., debugging), do:

sysctl net.bpf.jitter.enable=0

to turn it off.

Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).

Obtained from: WinPcap 3.1 (for i386)


# 150929 04-Oct-2005 csjp

Protect PID initializations for statistics by the bpf descriptor
locks. Also while we are here, protect the bpf descriptor during
knlist_remove{add} operations.

Discussed with: rwatson


# 150135 14-Sep-2005 andre

Undo a tad little optimization to bpf_mtap() introduced in rev. 1.95
which broke the correct handling of the BIOCGSEESENT flag in the bpf
listener.

PR: kern/56441
Submitted by: <vys at renet.ru>
MFC after: 3 days


# 149809 05-Sep-2005 csjp

Instead of caching the PID which opened the bpf descriptor, continuously
refresh the PID which has the descriptor open. The PID is refreshed in various
operations like ioctl(2), kevent(2) or poll(2). This produces more accurate
information about current bpf consumers. While we are here remove the bd_pcomm
member of the bpf stats structure because now that we have an accurate PID we
can lookup the via the kern.proc.pid sysctl variable. This is the trick that
NetBSD decided to use to deal with this issue.

Special care needs to be taken when MFC'ing this change, as we have made a
change to the bpf stats structure. What will end up happening is we will leave
the pcomm structure but just mark it as being un-used. This way we keep the ABI
in tact.

MFC after: 1 month
Discussed with: Rui Paulo < rpaulo at NetBSD dot org >


# 149376 22-Aug-2005 csjp

Introduce two new ioctl(2) commands, BIOCLOCK and BIOCSETWF. These commands
enhance the security of bpf(4) by further relinquishing the privilege of
the bpf(4) consumer (assuming the ioctl commands are being implemented).

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underly parameters of the
bpf(4) device. An example might be the setting of bpf(4) filter programs or
attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets. Currently if
a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used
as a raw socket, regardless of consumer's UID. Write filters give users the
ability to constrain which packets can be sent through the bpf(4) descriptor.

These features are currently implemented by a couple programs which came from
OpenBSD, such as the new dhclient and pflogd.

-Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify
whether a read or write filter is to be set.
-Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the
filter program on the mbuf data once we move the packet in from user-space.
-Rather than execute two uiomove operations, (one for the link header and the
other for the packet data), execute one and manually copy the linker header
into the sockaddr structure via bcopy.
-Restructure bpf_setf to compensate for write filters, as well as read.
-Adjust bpf(4) stats structures to include a bd_locked member.

It should be noted that the FreeBSD and OpenBSD implementations differ a bit in
the sense that we unconditionally enforce the lock, where OpenBSD enforces it
only if the calling credential is not root.

Idea from: OpenBSD
Reviewed by: mlaier


# 149255 18-Aug-2005 csjp

Add missing braces around bpf_filter which were missed when I
merged the bpfstat code.

Pointed out by: iedowse
Pointy hat to: csjp
MFC after: 3 days


# 148868 08-Aug-2005 rwatson

Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by: phk
MFC after: 3 days


# 148418 26-Jul-2005 csjp

Rather than hold a mutex over calls to SYSCTL_OUT allocate a
temporary buffer then pass the array to user-space once we have
dropped the lock.

While we are here, drop an assertion which could result in a
kernel panic under certain race conditions.

Pointed out by: rwatson


# 148366 24-Jul-2005 csjp

Introduce new sysctl variable: net.bpf.stats. This sysctl variable can
be used to pass statistics regarding dropped, matched and received
packet counts from the kernel to user-space. While we are here
introduce a new counter for filtered or matched packets. We currently
keep track of packets received or dropped by the bpf device, but not
how many packets actually matched the bpf filter.

-Introduce net.bpf.stats sysctl OID
-Move sysctl variables after the function prototypes so we can
reference bpf_stats_sysctl(9) without build errors.
-Introduce bpf descriptor counter which is used mainly for sizing
of the xbpf_d array.
-Introduce a xbpf_d structure which will act as an external
representation of the bpf_d structure.
-Add a the following members to the bpfd structure:

bd_fcount - Number of packets which matched bpf filter
bd_pid - PID which opened the bpf device
bd_pcomm - Process name which opened the device.

It should be noted that it's possible that the process which opened
the device could be long gone at the time of stats collection. An
example might be a process that opens the bpf device forks then exits
leaving the child process with the bpf fd.

Reviewed by: mdodd


# 147730 01-Jul-2005 ssouhlal

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


# 147611 26-Jun-2005 dwmalone

Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

1) Consistently use type u_int32_t for the header of a
DLT_NULL device - it continues to represent the address
family as always.
2) In the DLT_NULL case get bpf_movein to store the u_int32_t
in a sockaddr rather than in the mbuf, to be consistent
with all the DLT types.
3) Consequently fix a bug in bpf_movein/bpfwrite which
only permitted packets up to 4 bytes less than the MTU
to be written.
4) Fix all DLT_NULL devices to have the code required to
allow writing to their bpf devices.
5) Move the code to allow writing to if_lo from if_simloop
to looutput, because it only applies to DLT_NULL devices
but was being applied to other devices that use if_simloop
possibly incorrectly.

PR: 82157
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (scottl)


# 147256 10-Jun-2005 brooks

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


# 147065 06-Jun-2005 csjp

Change the maximum bpf program instruction limitation from being hard-
coded at 512 (BPF_MAXINSNS) to being tunable. This is useful for users
who wish to use complex or large bpf programs when filtering traffic.
For now we will default it to BPF_MAXINSNS. I have tested bpf programs
with well over 21,000 instructions without any problems.

Discussed with: phk


# 145852 04-May-2005 csjp

-introduce net.bpf sysctl instead of the less intuitive debug.*

debug.bpf_bufsize is now net.bpf.bufsize
debug.bpf_maxbufsize is now net.bpf.maxbufsize

-move function prototypes for bpf_drvinit and bpf_clone up to the
top of the file with the others
-assert bpfd lock in catchpacket() and bpf_wakeup()

MFC after: 2 weeks


# 144389 31-Mar-2005 phk

Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.


# 144198 27-Mar-2005 green

You must selwakeup{,pri}() when closing a selectable object or the
td->td_sel will get trashed and crash the system. Fix BPF's mistake
in this area.

MFC after: 1 day


# 143064 02-Mar-2005 jmg

fix a bug where bpf would try to wakeup before updating the state.. This
was causing kqueue not to see the correct state and not wake up a process
that is waiting...

Submitted by: nCircle Network Security, Inc.


# 142906 01-Mar-2005 glebius

Use NET_CALLOUT_MPSAFE macro.


# 142793 28-Feb-2005 rwatson

In bpf_setf(), protect against races between multiple user threads
attempting to change the BPF filter on a BPF descriptor at the same
time: retrieve the old filter pointer under the same locked region
as setting the new pointer.

MFC after: 3 days


# 142787 28-Feb-2005 rwatson

Update a comment describing bpf_iflist to indicate that the BPF interface
structures correspond to specific link layers, so the same network
interface may appear more than once.

MFC after: 3 days


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 139358 27-Dec-2004 pjd

Fix mbuf leak.

Submitted by: Johnny Eriksson <bygg@cafax.se>
MFC after: 5 days


# 139206 22-Dec-2004 phk

Include fcntl.h
Check O_NONBLOCK instead of IO_NDELAY
Include uio.h
Don't include vnode.h
Don't include filedesc.h


# 138950 17-Dec-2004 jmg

don't try to recurse on the bpf lock.. kqueue already locks the bpf lock
now...

Submitted by: Ed Maste of Sandvine Inc.
MFC after: 1 week


# 138540 08-Dec-2004 sam

Don't require a device to be marked up when issuing BIOCSETIF.


# 136185 06-Oct-2004 green

Don't recurse the BPF descriptor lock during the BIOCSDLT operation
(and panic). To try to finish making BPF safe, at the very least,
the BPF descriptor lock really needs to change into a reader/writer
lock that controls access to "settings," and a mutex that controls
access to the selinfo/knote/callout. Also, use of callout_drain()
instead of callout_stop() (which is really a much more widespread
issue).


# 134970 09-Sep-2004 rwatson

Reformulate bpf_dettachd() to acquire the BIF_LOCK() as well as
BPFD_LOCK() when removing a descriptor from an interface descriptor
list. Hold both over the operation, and do a better job at
maintaining the invariant that you can't find partially connected
descriptors on an active interface descriptor list.

This appears to close a race that resulted in the kernel performing
a NULL pointer dereference when BPF sessions are detached during
heavy network activity on SMP systems.

RELENG_5 candidate.


# 134967 08-Sep-2004 rwatson

Reformulate use of linked lists in 'struct bpf_d' and 'struct bpf_if'
to use queue(3) list macros rather than hand-crafted lists. While
here, move to doubly linked lists to eliminate iterating lists in
order to remove entries. This change simplifies and clarifies the
list logic in the BPF descriptor code as a first step towards revising
the locking strategy.

RELENG_5 candidate.

Reviewed by: fenner


# 134966 08-Sep-2004 rwatson

Compare/set pointers using NULL not 0.


# 133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 133148 05-Aug-2004 rwatson

Do a lockless read of the BPF interface structure descriptor list head
before grabbing BPF locks to see if there are any entries in order to
avoid the cost of locking if there aren't any. Avoids a mutex lock/
unlock for each packet received if there are no BPF listeners.


# 132602 24-Jul-2004 rwatson

Prefer NULL to '0' when checking a pointer value.


# 131630 05-Jul-2004 rwatson

In the BPF and ethernet bridging code, don't allow callouts to execute
without Giant if we're not debug.mpsafenet=1.


# 130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 130335 11-Jun-2004 rwatson

Switch to conditionally acquiring and dropping Giant around calls into
ifp->if_output() basedd on debug.mpsafenet. That way once bpfwrite()
can be called without Giant, it will acquire Giant (if desired) before
entering the network stack.


# 130334 11-Jun-2004 rwatson

Un-staticize 'dst' sockaddr in the stack of bpfwrite() to prevent
the need to synchronize access to the structure. I believe this
should fit into the stack under the necessary circumstances, but
if not we can either add synchronization or use a thread-local
malloc for the duration.


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 126405 29-Feb-2004 rwatson

Grab Giant after MAC processing on outgoing packets being sent via
BPF. Grab the BPF descriptor lock before entering MAC since the MAC
Framework references BPF descriptor fields, including the BPF
descriptor label.

Submitted by: sam


# 126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


# 126076 21-Feb-2004 phk

Device megapatch 1/6:

Free approx 86 major numbers with a mostly automatically generated patch.

A number of strategic drivers have been left behind by caution, and a few
because they still (ab)use their major number.


# 125879 16-Feb-2004 des

Random style fixes and a comment update. No functional changes.


# 123957 29-Dec-2003 tjr

Unbreak build of bpf-free kernels.


# 123922 28-Dec-2003 sam

o eliminate widespread on-stack mbuf use for bpf by introducing
a new bpf_mtap2 routine that does the right thing for an mbuf
and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
when prepending the address family (several places were assuming
sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
mbufs have been eliminated; this may better be moved to the bpf
routines

Reviewed by: arch@ and several others


# 122352 09-Nov-2003 tanimura

- Implement selwakeuppri() which allows raising the priority of a
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.

Not objected in: -arch, -current


# 121816 31-Oct-2003 brooks

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


# 120725 03-Oct-2003 sam

add a stub for bpfattach2 so bpf is not required with the 802.11
module or related drivers

Spotted by: Dan Lukes <dan@obluda.cz>


# 119751 04-Sep-2003 sam

Reduce window during which a race can occur when detaching
an interface from each descriptor that references it. This
is just a bandaid; the locking here needs to be redone.


# 119137 19-Aug-2003 sam

Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter. This
does not change the behaviour; it just makes the intent more clear.


# 118471 05-Aug-2003 jmg

add support for using kqueue to watch bpf sockets.

Submitted by: Brian Buchanan of nCircle, Inc.
Tested on: i386 and sparc64


# 112463 21-Mar-2003 mdodd

Assignment could be NULL, check.


# 111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


# 111790 03-Mar-2003 mdodd

sizeof(struct llc) -> LLC_SNAPFRAMELEN
sizeof(struct ether_header) -> ETHER_HDR_LEN
sizeof(struct fddi_header) -> FDDI_HDR_LEN


# 111748 02-Mar-2003 des

More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).


# 111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


# 111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 109580 20-Jan-2003 sam

o add BIOCGDLTLIST and BIOCSDLT ioctls to get the data link type list
and set the link type for use by libpcap and tcpdump
o move mtx unlock in bpfdetach up; it doesn't need to be held so long
o change printf in bpf_detach to distinguish it from the same one in bpfsetdlt

Note there are locking issues here related to ioctl processing; they
have not been addressed here.

Submitted by: Guy Harris <guy@alum.mit.edu>
Obtained from: NetBSD (w/ locking modifications)


# 108364 28-Dec-2002 phk

Remove cdevw_add() calls, they are deprecated.


# 107080 19-Nov-2002 sam

correct function declarations of stubs used for building w/o device bpf


# 106927 14-Nov-2002 sam

o add support for multiple link types per interface (e.g. 802.11 and Ethernet)
o introduce BPF_TAP and BPF_MTAP macros to hide implementation details and
ease code portability
o use m_getcl where appropriate

Reviewed by: many
Approved by: re
Obtained from: NetBSD (multiple link type support)


# 105598 21-Oct-2002 brooks

Use if_printf(ifp, "blah") instead of
printf("%s%d: blah", ifp->if_name, ifp->if_xname).


# 104393 03-Oct-2002 truckman

In an SMP environment post-Giant it is no longer safe to blindly
dereference the struct sigio pointer without any locking. Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.

Reviewed by: rwatson


# 104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 104090 28-Sep-2002 phk

Don't return(foo(bla)) when foo returns void.


# 103725 20-Sep-2002 rwatson

Insert a missing call to MAC protection check for delivering an
mbuf to a bpf device.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Submitted by: phk


# 103555 18-Sep-2002 phk

Use m_length() instead of home-rolled.

In bpf_mtap(), if the entire packet is in one mbuf, call bpf_tap()
instead since it is a tad faster.

Sponsored by: http://www.babeltech.dk/


# 101075 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke a MAC framework entry point to authorize reception of an
incoming mbuf by the BPF descriptor, permitting MAC policies to
limit the visibility of packets delivered to particular BPF
descriptors.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101074 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument BPF so that MAC labels are properly maintained on BPF
descriptors. MAC framework entry points are invoked at BPF
instantiation and allocation, permitting the MAC framework to
derive the BPF descriptor label from the credential authorizing
the device open. Also enter the MAC framework to label mbufs
created using the BPF device.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 96122 06-May-2002 alfred

Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp. We need
to check the process for P_WEXIT to see if it's exiting. Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.


# 95883 01-May-2002 alfred

Redo the sigio locking.

Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 93752 04-Apr-2002 luigi

Replace (deprecated ?) FREE() macro with direct calls to free()


# 92725 19-Mar-2002 alfred

Remove __P.


# 92260 14-Mar-2002 alfred

Missed this file for select SMP fixes associated with rev 1.93 of
kern/sys_generic.c


# 87955 14-Dec-2001 jdp

Make bpf's read timeout feature work more correctly with
select/poll, and therefore with pthreads. I doubt there is any way
to make this 100% semantically identical to the way it behaves in
unthreaded programs with blocking reads, but the solution here
should do the right thing for all reasonable usage patterns.

The basic idea is to schedule a callout for the read timeout when a
select/poll is done. When the callout fires, it ends the select if
it is still in progress, or marks the state as "timed out" if the
select has already ended for some other reason. Additional logic in
bpfread then does the right thing in the case where the timeout has
fired.

Note, I co-opted the bd_state member of the bpf_d structure. It has
been present in the structure since the initial import of 4.4-lite,
but as far as I can tell it has never been used.

PR: kern/22063 and bin/31649
MFC after: 3 days


# 86526 18-Nov-2001 arr

- M_ZERO already sets bif_dlist to zero; there is no need to
do it again.


# 85049 17-Oct-2001 ru

Record the fact that revision 1.39 corresponded to CSRG revision 8.4,
and first hunk of revision 1.76 corresponded to CSRG revision 8.3.


# 84781 10-Oct-2001 jhb

Malloc mutexes pre-zero'd as random garbage (including 0xdeadcode) my
trigget the check to make sure we don't initalize a mutex twice.


# 83805 21-Sep-2001 jhb

Use the passed in thread to selrecord() instead of curthread.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82239 23-Aug-2001 dd

Correct the comment about bpfattach() to match reality.

PR: 29967
Submitted by: Joseph Mallett <jmallett@xMach.org>


# 75204 04-Apr-2001 gad

Fix bpf devices so select() recognizes that they are always writable.

PR: 9355
Submitted by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Garrett Rooney <rooneg@electricjellyfish.net> (see pr :-)


# 74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


# 72784 21-Feb-2001 rwatson

o Remove unnecessary jail() check in bpfopen() -- we limit device access
in jail using /dev namespace limits and mknod() limits, not by explicit
checks in the device open code.


# 72544 16-Feb-2001 jlemon

Add mutexes to the entire bpf subsystem to make it MPSAFE.

Previously reviewed by: jhb, bde


# 71802 29-Jan-2001 peter

Supply a stub bpf_validate() (always returning false - the script is not
valid) if BPF is missing.
The netgraph_bpf node forced bpf to be present, reflect that in the
options.
Stop doing a 'count bpf' - we provide stubs.
Since a handful of drivers still refer to "bpf.h", provide a more accurate
indication that the API is present always. (eg: netinet6)


# 70414 27-Dec-2000 bmilekic

Small fix for bpf compat:
Make malloc() use M_NOWAIT istead of M_DONTWAIT and in the
bpf_compat case, define M_NOWAIT to be M_DONTWAIT.


# 70254 21-Dec-2000 bmilekic

* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.


# 70127 17-Dec-2000 jdp

Fix bug: a read() on a bpf device which was in non-blocking mode
and had no data available returned 0. Now it returns -1 with errno
set to EWOULDBLOCK (== EAGAIN) as it should. This fix makes the bpf
device usable in threaded programs.

Reviewed by: bde


# 69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# 69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


# 68271 02-Nov-2000 jhb

Fix an order of operations buglet. ! has higher precedence than &. This
should fix the warnings about bpf not calling make_dev().


# 66878 09-Oct-2000 phk

Don't make_dev() in bpfopen() unless we need to.


# 66067 19-Sep-2000 phk

Rename lminor() to dev2unit(). This function gives a linear unit number
which hides the 'hole' in the minor bits.

Introduce unit2minor() to do the reverse operation.

Fix some some make_dev() calls which didn't use UID_* or GID_* macros.

Kill the v_hashchain alias macro, it hides the real relationship.

Introduce experimental SI_CHEAPCLONE flag set it on cloned bpfs.


# 65922 16-Sep-2000 brian

Call bpfattach() correctly from if_ppp.c

Submitted by: Andy Adams <ala@merit.edu>
PR: 18506


# 65374 02-Sep-2000 phk

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


# 64880 20-Aug-2000 phk

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC


# 61153 01-Jun-2000 phk

Don't panic if ifpromisc() returnes ENXIO, it's probably just an pccard
which have been pulled.


# 59696 27-Apr-2000 wpaul

Add a bpfdetach() stub routine to bpf.c. Without this, you'll get an
unresolved symbol error if you try to load a network driver into a kernel
which doesn't have bpf enabled.

Forgotten by: rwatson
Found by: peter


# 58273 19-Mar-2000 rwatson

The advent of if_detach, allowing interface removal at runtime, makes it
possible for a panic to occur if BPF is in use on the interface at the
time of the call to if_detach. This happens because BPF maintains pointers
to the struct ifnet describing the interface, which is freed by if_detach.

To correct this problem, a new call, bpfdetach, is introduced. bpfdetach
locates BPF descriptor references to the interface, and NULLs them. Other
BPF code is modified so that discovery of a NULL interface results in
ENXIO (already implemented for some calls). Processes blocked on a BPF
call will also be woken up so that they can receive ENXIO.

Interface drivers that invoke bpfattach and if_detach must be modified to
also call bpfattach(ifp) before calling if_detach(ifp). This is relevant
for buses that support hot removal, such as pccard and usb. Patches to
all effected devices will not be committed, only to if_wi.c, due to
testing limitations. To reproduce the crash, load up tcpdump on you
favorite pccard ethernet card, and then eject the card. As some pccard
drivers do not invoke if_detach(ifp), this bug will not manifest itself
for those drivers.

Reviewed by: wes


# 58192 18-Mar-2000 rwatson

Introduce a new bd_seesent flag to the BPF descriptor, indicating whether or
not the current BPF device should report locally generated packets or not.
This allows sniffing applications to see only packets that are not generated
locally, which can be useful for debugging bridging problems, or other
situations where MAC addresses are not sufficient to identify locally
sourced packets. Default to true for this flag, so as to provide existing
behavior by default.

Introduce two new ioctls, BIOCGSEESENT and BIOCSSEESENT, which may be used
to manipulate this flag from userland, given appropriate privilege.

Modify bpf.4 to document these two new ioctl arguments.

Reviewed by: asmodai


# 56057 15-Jan-2000 phk

|The hard limit for the BPF buffer size is 32KB, which appears too low
|for high speed networks (even at 100Mbit/s this corresponds to 1/300th
|of a second). The default buffer size is 4KB, but libpcap and ipfilter
|both override this (using the BIOCSBLEN ioctl) and allocate 32KB.
|
|The following patch adds an sysctl for bpf_maxbufsize, similar to the
|one for bpf_bufsize that you added back in December 1995. I choose to
|make the default for this limit 512KB (the value suggested by NFR).

Submitted by: se
Reviewed by: phk


# 54075 03-Dec-1999 julian

Make the stub routines have the same prototypes as the real bpf
routines.


# 52852 03-Nov-1999 archie

Fix bug in BIOCGETIF ioctl() where it would return a bogus interface
name if the interface unit number was greater than 9.


# 52248 15-Oct-1999 msmith

Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header
completion' flag. If set, the interface output routine will assume that
the packet already has a valid link-level source address. This defaults
to off (the address is overwritten)

PR: kern/10680
Submitted by: "Christopher N . Harrell" <cnh@mindspring.net>
Obtained from: NetBSD


# 51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 49827 15-Aug-1999 phk

Give BPF the "almost-clone" update. If you need more of them, make
more entries in /dev and be happy you don't need to recompile your
kernel.


# 48645 06-Jul-1999 des

Rename bpfilter to bpf.


# 47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


# 47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


# 46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 46130 27-Apr-1999 msmith

Allow loadable interface drivers with BPF support to be loaded into a kernel
that doesn't have it. This is achieved by having minimal do-nothing stubs
enabled when there are no bpfilter devices configured.

Driver modules should be built with BPF enabled for maximum
convenience (but can be built without it for maximum performance).


# 43305 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


# 41086 11-Nov-1998 truckman

Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR: kern/7899
Reviewed by: bde, elvind


# 40049 07-Oct-1998 alex

Check the timeval passed to BIOCSRTIMEOUT with itimerfix. Use tvtohz()
to convert the timeval into a tick count.

Suggested by: bde
Reviewed by: bde

Handle hz > 1000 in BIOCGRTIMEOUT.

Pointed out by: bde
Reviewed by: bde
Obtained from: OpenBSD


# 39964 04-Oct-1998 alex

The length argument for bcopy is a size_t, not u_int. Adjust
bpf_mcopy() and catchpacket() prototypes accordingly.


# 39955 04-Oct-1998 alex

Support hz > 1000 (Alpha) in BIOCSRTIMEOUT.

Obtained from: OpenBSD


# 38423 18-Aug-1998 ache

Implement DLT_RAW from libpcap


# 37939 29-Jul-1998 kjc

update ATM driver. (base version: midway.c 1.67 --> 1.68)

several new features are added:
- support vc/vp shaping
- support pvc shadow interface

code cleanup:
- remove WMAYBE related code. ENI WMAYBE DMA doen't work.
- remove updating if_lastchange for every packet.
- BPF related code is moved to midway.c as it should be.
(bpfwrite should work if atm_pseudohdr and LLC/SNAP are
prepended.)
- BPF link type is changed to DLT_ATM_RFC1483.
BPF now understands only LLC/SNAP!! (because bpf can't
handle variable link header length.)
It is recommended to use LLC/SNAP instead of NULL
encapsulation for various reasons. (BPF, IPv6,
interoperability, etc.)

the code has been used for months in ALTQ and KAME IPv6.

OKed by phk long time ago.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 33679 20-Feb-1998 bde

Don't depend on "implicit int" or bloat the data section in the
declaration of xxx_devsw_installed.


# 32726 24-Jan-1998 eivind

Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.

This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.


# 31282 18-Nov-1997 bde

Removed unused #includes.

Fixed nonblocking mode. It was per-device instead of per-file. This
also fixes clobbering of bd_rtout by overloading it to hold a wrong
version of the blocking flag. I hope nothing depends on the bugs.


# 30090 03-Oct-1997 julian

Allow interfaces to be attached to bpf at times other than boot.
doing so without this patch leads to an infinite loop in the kernel.


# 29506 16-Sep-1997 bde

Fixed gratuitous ANSIisms.


# 29364 14-Sep-1997 peter

select -> poll

Obtained from: NetBSD (I think)


# 29024 01-Sep-1997 bde

Added used #include - don't depend on <sys/mbuf.h> including
<sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).


# 24208 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 6: include
<sys/filio.h>, <sys/sockio.h> and <sys/ttycom.h> instead of
<sys/ioctl.h> in a couple of files. This is still only 1/3
as spammish as <sys/ioctl.h> - 5 or 6 old tty ioctl headers
aren't needed.


# 24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21436 08-Jan-1997 wollman

Correctly account for header length in m_pkthdr.len when sending
packets through BPF.

Submitted by: seki@sysrap.cs.fujitsu.co.jp in PR#2415


# 16206 08-Jun-1996 bde

Changed some memcpy()'s back to bcopy()'s.

gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.


# 16194 08-Jun-1996 dg

Fix bug in bpf_ifname() where the unit didn't get added correctly to the
name string. This function should be rewritten to deal with more than
10 units of a given type.

Pointed out by: jmf@free-gate.com (Jean-Marc Frailong)
(I fixed it slightly differently)


# 15116 07-Apr-1996 bde

Removed now-unused #includes of <machine/cpu.h>. They were for bootverbose
being declared in the wrong place.


# 14877 28-Mar-1996 scrappy

Using devfs_add_devswf() instead of devfs_add_devsw()

Reviewed by: julian@freebsd.org


# 13937 06-Feb-1996 wollman

Clean up Ethernet drivers:
- fill in and use ifp->if_softc
- use if_bpf rather than private cookie variables
- change bpf interface to take advantage of this
- call ether_ifattach() directly from Ethernet drivers
- delete kludge in if_attach() that did this indirectly


# 12820 14-Dec-1995 phk

Another mega commit to staticize things.


# 12678 08-Dec-1995 phk

Julian forgot to make the *devsw structures static.


# 12675 08-Dec-1995 julian

Pass 3 of the great devsw changes
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)

If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)

pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)


# 12659 06-Dec-1995 bde

Replaced #includes of <sys/user.h> by less gross headers, usually
<sys/vm.h>. Many device drivers need only the definition of vtophys()
from vm.

Added nearby #includes of <sys/conf.h> where appropriate.


# 12579 02-Dec-1995 bde

Completed function declarations and/or added prototypes.


# 12521 29-Nov-1995 julian

If you're going to mechanically replicate something in 50 files
it's best to not have a (compiles cleanly) typo in it! (sigh)


# 12517 29-Nov-1995 julian

OK, that's it..
That's EVERY SINGLE driver that has an entry in conf.c..
my next trick will be to define cdevsw[] and bdevsw[]
as empty arrays and remove all those DAMNED defines as well..

Each of these drivers has a SYSINIT linker set entry
that comes in very early.. and asks teh driver to add it's own
entry to the two devsw[] tables.

some slight reworking of the commits from yesterday (added the SYSINIT
stuff and some usually wrong but token DEVFS entries to all these
devices.

BTW does anyone know where the 'ata' entries in conf.c actually reside?
seems we don't actually have a 'ataopen() etc...

If you want to add a new device in conf.c
please make sure I know
so I can keep it up to date too..

as before, this is all dependent on #if defined(JREMOD)
(and #ifdef DEVFS in parts)


# 12427 20-Nov-1995 phk

Fix #includes.


# 10957 22-Sep-1995 wollman

Fix BPf to generate a header mbuf for writes.
Fix loopback and discard interfaces to understand BPF writes.
(These two from Bill Fenner to fix PR 512.)

Move ifpromisc() from bpf.c to if.c as suggested by comment in BPF.
Send a notice to the log when promiscuous mode is enabled.


# 10929 20-Sep-1995 wollman

Only print `bpf: foo0 attached' if bootverbose.


# 10624 08-Sep-1995 bde

Fix benign type mismatches in devsw functions. 82 out of 299 devsw
functions were wrong.


# 9819 31-Jul-1995 peter

Fix panic("ifpromisc failed") when shutting down a bpf tap when the attached
interface is no longer IFF_UP.
The test for IFF_UP in ifpromisc is only useful while enabling IFF_PROMISC
and the higher levels of the bpf code do not allow for the possibility of
failure while shutting down. This is a trivial change.
Also, fixes PR#522.


# 9540 16-Jul-1995 bde

Don't include <sys/tty.h> in drivers that aren't tty drivers or in general
files that don't depend on the internals of <sys/tty.h>


# 9235 15-Jun-1995 pst

Give the BPF the ability to generate signals when a packet is available.

Reviewed by: pst & wollman
Submitted by: grossman@cygnus.com


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8384 09-May-1995 dg

Replaced some bcopy()'s with memcpy()'s so that gcc while inline/optimize.


# 7055 14-Mar-1995 dg

Added support for generic FDDI and the DEC DEFEA and DEFPA FDDI adapters.

Submitted by: Matt Thomas


# 3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


# 2142 20-Aug-1994 dg

1) cleaned up after Garrett - fixed more redundant declarations, changed
use of timeout_t -> timeout_func_t in aha1542 and aha1742 drivers.
2) fix a bug in the portalfs that was uncovered by better prototyping -
specifically, the time must be converted from timeval to timespec
before storing in va_atime.
3) fixed/added some miscellaneous prototypes


# 1817 02-Aug-1994 dg

Added $Id$


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources