History log of /freebsd-10.0-release/sys/dev/netmap/netmap_kern.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 251139 30-May-2013 luigi

Bring in a number of new features, mostly implemented by Michio Honda:

- the VALE switch now support up to 254 destinations per switch,
unicast or broadcast (multicast goes to all ports).

- we can attach hw interfaces and the host stack to a VALE switch,
which means we will be able to use it more or less as a native bridge
(minor tweaks still necessary).
A 'vale-ctl' program is supplied in tools/tools/netmap
to attach/detach ports the switch, and list current configuration.

- the lookup function in the VALE switch can be reassigned to
something else, similar to the pf hooks. This will enable
attaching the firewall, or other processing functions (e.g. in-kernel
openvswitch) directly on the netmap port.

The internal API used by device drivers does not change.

Userspace applications should be recompiled because we
bump NETMAP_API as we now use some fields in the struct nmreq
that were previously ignored -- otherwise, data structures
are the same.

Manpages will be committed separately.


# 250107 30-Apr-2013 luigi

Partial cleanup in preparation for upcoming changes:

- netmap_rx_irq()/netmap_tx_irq() can now be called by FreeBSD drivers
hiding the logic for handling NIC interrupts in netmap mode.
This also simplifies the case of NICs attached to VALE switches.
Individual drivers will be updated with separate commits.

- use the same refcount() API for FreeBSD and linux

- plus some comments, typos and formatting fixes

Portions contributed by Michio Honda


# 250052 29-Apr-2013 luigi

whitespace changes:
remove $Id$ lines, and add blank lines around some #if / #elif /#endif


# 249659 19-Apr-2013 luigi

mostly whitespace changes:
- remove vestiges of the old memory allocator
- clean up some comments


# 245835 23-Jan-2013 luigi

control some debugging messages with dev.netmap.verbose

add infrastracture to adapt to changes in number of queues
and buffers at runtime


# 245579 17-Jan-2013 luigi

add some definition and driver changes in preparation for
two upcoming features:

semi-transparent mode:
when a device is opened in this mode, the
user program will be able to mark slots that must be forwarded
to the "other" side (i.e. from NIC to host stack, or viceversa),
and the forwarding will occur automatically at the next netmap syscall.
This saves the need to open another file descriptor and do
the forwarding manually.

direct-forwarding mode:
when operating with a VALE port, the user can specify in the slot
the actual destination port, overriding the forwarding decision
made by a lookup of the destination MAC. This can be useful to
implement packet dispatchers.

No API changes will be introduced.
No new functionality in this patch yet.


# 241719 19-Oct-2012 luigi

This is an import of code, mostly from Giuseppe Lettieri,
that revises the netmap memory allocator so that the
various parameters (number and size of buffers, rings, descriptors)
can be modified at runtime through sysctl variables.
The changes become effective when no netmap clients are active.

The API is mostly unchanged, although the NIOCUNREGIF ioctl now
does not bring the interface back to normal mode: and you
need to close the file descriptor for that.
This change was necessary to track who is using the mapped region,
and since it is a simplification of the API there was no
incentive in trying to preserve NIOCUNREGIF.
We will remove the ioctl from the kernel next time we need
a real API change (and version bump).

Among other things, buffer allocation when opening devices is
now much faster: it used to take O(N^2) time, now it is linear.

Submitted by: Giuseppe Lettieri


# 239140 08-Aug-2012 emaste

Clarify comments about number of tx / rx rings


# 238985 02-Aug-2012 luigi

fix some signed/unsigned warnings in the netmap code.
Unfortunately the original drivers still have a lot of
sign conversion/comparison warnings.


# 238937 31-Jul-2012 luigi

remove a redundant MALLOC_DECLARE


# 238831 27-Jul-2012 luigi

remove unused definition, whitespace cleanup


# 238812 26-Jul-2012 luigi

Add support for VALE bridges to the netmap core, see

http://info.iet.unipi.it/~luigi/vale/

VALE lets you dynamically instantiate multiple software bridges
that talk the netmap API (and are *extremely* fast), so you can test
netmap applications without the need for high end hardware.

This is particularly useful as I am completing a netmap-aware
version of ipfw, and VALE provides an excellent testing platform.

Also, I also have netmap backends for qemu mostly ready for commit
to the port, and this too will let you interconnect virtual machines
at high speed without fiddling with bridges, tap or other slow solutions.

The API for applications is unchanged, so you can use the code
in tools/tools/netmap (which i will update soon) on the VALE ports.

This commit also syncs the code with the one in my internal repository,
so you will see some conditional code for other platforms.
The code should run mostly unmodified on stable/9 so people interested
in trying it can just copy sys/dev/netmap/ and sys/net/netmap*.h
from HEAD

VALE is joint work with my colleague Giuseppe Lettieri, and
is partly supported by the EU Projects CHANGE and OPENLAB


# 234227 13-Apr-2012 luigi

A bit of cleanup in the names of fields of netmap-related structures.
Use the name 'ring' instead of 'queue' in all fields.
Bump NETMAP_API.


# 234174 12-Apr-2012 luigi

Some code restructuring to bring the memory allocator out of netmap.c
and make it easier to replace it with a different implementation.
On passing, also fix indentation.

NOTE: I know that #include "foo.c" is ugly, but the alternative
(add another entry to sys/conf/files, add a separate header with
structs and prototypes, and expose functions that are meant to
be private) looks even worse to me.
We need a more modular way to specify dependencies and build options.


# 232238 27-Feb-2012 luigi

A bunch of netmap fixes:

USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.

Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.


# 231796 15-Feb-2012 luigi

(This commit only touches code within the DEV_NETMAP blocks)

Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.


# 231594 13-Feb-2012 luigi

- use struct ifnet as explicit type of the argument to the
txsync() and rxsync() callbacks, removing some variables made
useless by this change;

- add generic lock and irq handling routines. These can be useful
in case there are no driver locks that we can reuse;

- add a few macros to reduce differences with the Linux version.


# 231198 08-Feb-2012 luigi

- change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
to replace it with a different implementation. The current code
relies on a single large contiguous chunk of memory obtained through
contigmalloc.
The new implementation (not committed yet) uses multiple
smaller chunks which are easier to fit in a fragmented address
space.


# 230572 26-Jan-2012 luigi

ixgbe changes:
- remove experimental code for disabling CRC
- use the correct constant for conversion between interrupt rate
and EITR values (the previous values were off by a factor of 2)
- make dev.ix.N.queueM.interrupt_rate a RW sysctl variable.
Changing individual values affects the queue immediately,
and propagates to all interfaces at the next reinit.
- add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual
interrupt counts

Netmap-related changes for ixgbe:
- use the "new" format for TX descriptors in netmap mode.
- pass interrupt mitigation delays to the user process doing poll()
on a netmap file descriptor.
On the RX side this means we will not check the ring more than once
per interrupt. This gives the process a chance to sleep and process
packets in larger batches, thus reducing CPU usage.
On the TX side we take this even further: completed transmissions are
reclaimed every half ring even if the NIC interrupts more often.
This saves even more CPU without any additional tx delays.

Generic Netmap-related changes:
- align the netmap_kring to cache lines so that there is no false sharing
(possibly useful for multiqueue NICs and MSIX interrupts, which are
handled by different cores). It's a minor improvement but it does not
cost anything.

Reviewed by: Jack Vogel
Approved by: Jack Vogel


# 230058 13-Jan-2012 luigi

indentation and whitespace fixes


# 230052 13-Jan-2012 luigi

Two performance-related fixes:
1. as reported by Alexander Fiveg, the allocator was reporting
half of the allocated memory. Fix this by exiting from the
loop earlier (not too critical because this code is going
away soon).

2. following a discussion on freebsd-current
http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031144.html
turns out that (re)loading the dmamap was expensive and not optimized.
This operation is in the critical path when doing zero-copy forwarding
between interfaces.
At least on netmap and i386/amd64, the bus_dmamap_load can be
completely bypassed if the map is NULL, so we do it.

The latter change gives an almost 3x improvement in forwarding
performance, from the previous 9.5Mpps at 2.9GHz to the current
line rate (14.2Mpps) at 1.733GHz. (this is for 64+4 byte packets,
in other configurations the PCIe bus is a bottleneck).


# 229939 10-Jan-2012 luigi

small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block


# 228845 23-Dec-2011 luigi

1. don't use if_pspare directly, but through a macro WMA()

2. move a variable declaration at the beginning of a block


# 228276 05-Dec-2011 luigi

1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
even if the NIC resets its own ring (e.g. restarting from 0),
the client will not see any change in the current rx/tx positions,
because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
There were some inconsistencies in the implementation of the netmap
support routines, now drivers have been aligned to a common
code structure.

3. import netmap support for ixgbe . This is implemented as a very
small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
in total the patch has only 54 lines of new code) , as most of
the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
following some initial comments from Jack Vogel about making
changes less intrusive.
(Note, i have emailed Jack multiple times asking if he had
comments on this structure of the code; i got no reply so
i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.


# 227614 17-Nov-2011 luigi

Bring in support for netmap, a framework for very efficient packet
I/O from userspace, capable of line rate at 10G, see

http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
Netmap has been developed by Luigi Rizzo and other collaborators
at the Universita` di Pisa, and supported by EU project CHANGE
(http://www.change-project.eu/)
The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
We should place kernel documentation in the same dir that contains
the code, which would make it much simpler to keep doc and code
in sync, reduce the clutter in share/man/ and incidentally is
the policy used for all of userspace code.
Makefiles and doc tools can be trivially adjusted to find the
manpages in the relevant subdirs.