#
247629 |
|
02-Mar-2013 |
melifaro |
Merge * r233937 - Improve BPF locking model * r233938 - Improve performace for writer-only BPF users * r233946 - Fix build * r235744 - Fix (new) panic on attaching to non-existent interface * r235745 - Fix old panic when BPF consumer attaches to destroying interface * r235746 - Call bpf_jitter() before acquiring BPF global lock * r235747 - Make most BPF ioctls() SMP-safe. * r236231 - Fix BPF_JITTER code broken by r235746. * r236251 - Fix shim for BIOCSETF to drop all packets buffered on the descriptor. * r236261 - Save the previous filter right before we set new one. * r236262 - Fix style(9) nits, reduce unnecessary type castings. * r236559 - Fix panic introduced by r235745 * r236806 - Fix typo introduced in r236559.
r233937 - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9). This greately improves performance: in most common case we need to acquire 1 reader lock instead of 2 mutexes.
- Remove filter(descriptor) (reader) lock in bpf_mtap[2] This was suggested by glebius@. We protect filter by requesting interface writer lock on filter change.
- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h without including rwlock stuff. However, this is is temporary solution, struct bpf_if should be made opaque for any external caller.
r233938 - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send raw ethernet frames. The only FreeBSD interface that can be used to send raw frames is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses BPF only to send data. This leads us to the situation when software like cdpd, being run on high-traffic-volume interface significantly reduces overall performance since we have to acquire additional locks for every packet.
Here we add sysctl that changes BPF behavior in the following way: If program came and opens BPF socket without explicitly specifyin read filter we assume it to be write-only and add it to special writer-only per-interface list. This makes bpf_peers_present() return 0, so no additional overhead is introduced. After filter is supplied, descriptor is added to original per-interface list permitting packets to be captured.
Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of setting snap length.
Fortunately, most programs explicitly sets (event catch-all) filter after that. tcpdump(1) is a good example.
So a bit hackis approach is taken: we upgrade description only after second BIOCSETF is received.
Sysctl is named net.bpf.optimize_writers and is turned off by default.
- While here, document all sysctl variables in bpf.4
r233946 Fix build broken by r233938.
r235744 Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@) Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@) Protect most of bpf_setf() by BPF global lock
Add several forgotten assertions (thanks to adrian@)
Document current locking model inside bpf.c Document EVENTHANDLER(9) usage inside BPF.
r235745 Fix old panic when BPF consumer attaches to destroying interface. 'flags' field is added to the end of bpf_if structure. Currently the only flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd() Problem can be easily triggered on SMP stable/[89] by the following command (sort of): 'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; \ tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'
Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory, while interface is still UP and there can be routes via this interface. Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.
Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)
r235746 Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl. This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.
r235747 Make most BPF ioctls() SMP-safe.
r236559 Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface. Add several comments on locking
r236231 Fix BPF_JITTER code broken by r235746.
r236251 Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor and reset statistics as it should.
r236261 - Save the previous filter right before we set new one. - Reduce duplicate code and make it little easier to read.
r236262 Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf().
r236806 Fix typo introduced in r236559.
|
#
177548 |
|
24-Mar-2008 |
csjp |
Introduce support for zero-copy BPF buffering, which reduces the overhead of packet capture by allowing a user process to directly "loan" buffer memory to the kernel rather than using read(2) to explicitly copy data from kernel address space.
The user process will issue new BPF ioctls to set the shared memory buffer mode and provide pointers to buffers and their size. The kernel then wires and maps the pages into kernel address space using sf_buf(9), which on supporting architectures will use the direct map region. The current "buffered" access mode remains the default, and support for zero-copy buffers must, for the time being, be explicitly enabled using a sysctl for the kernel to accept requests to use it.
The kernel and user process synchronize use of the buffers with atomic operations, avoiding the need for system calls under load; the user process may use select()/poll()/kqueue() to manage blocking while waiting for network data if the user process is able to consume data faster than the kernel generates it. Patchs to libpcap are available to allow libpcap applications to transparently take advantage of this support. Detailed information on the new API may be found in bpf(4), including specific atomic operations and memory barriers required to synchronize buffer use safely.
These changes modify the base BPF implementation to (roughly) abstrac the current buffer model, allowing the new shared memory model to be added, and add new monitoring statistics for netstat to print. The implementation, with the exception of some monitoring hanges that break the netstat monitoring ABI for BPF, will be MFC'd.
Zerocopy bpf buffers are still considered experimental are disabled by default. To experiment with this new facility, adjust the net.bpf.zerocopy_enable sysctl variable to 1.
Changes to libpcap will be made available as a patch for the time being, and further refinements to the implementation are expected.
Sponsored by: Seccuris Inc. In collaboration with: rwatson Tested by: pwood, gallatin MFC after: 4 months [1]
[1] Certain portions will probably not be MFCed, specifically things that can break the monitoring ABI.
|