History log of /freebsd-10.0-release/sys/dev/e1000/if_igb.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 256200 09-Oct-2013 jfv

Update the Intel igb driver to version 2.4.0
- This version has support for the new Intel Avoton systems,
including 2.5Gb support, further it now has IPv6/TSO6 support as
well. Shared code has been updated where necessary as well. Thanks
to my new assistant Eric Joyner for doing the transmit path changes
to bring in the IPv6/TSO6 support. Thanks to Gleb for catching the
one bug and change needed in NETMAP.

Approved by: re


# 256069 05-Oct-2013 hiren

Expose system level ixgbe sysctls.
Device level sysctls are already exposed as dev.ix.<device>

Fixing the case where number of queues for igb is auto-tuned and
hw.igb.num_queues does not return current/updated value.

Reviewed by: jfv
Approved by: re (delphij)
MFC after: 2 weeks


# 254804 24-Aug-2013 andre

Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features. The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
layer specific union PH_loc for local use. Protocols can flexibly overlay
their own 8 to 64 bit fields to store information while the packet is
worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
rsstype field. Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
with the packet. It is not yet supported in any drivers but allows us to
get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
from the start of the packet. This is important for various offload
capabilities and to relieve the drivers from having to parse the packet
and protocol headers to find out location of checksums and other
information. Header parsing in drivers is a lot of copy-paste and
unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
packet information, like ether_vtag, tso_segsz and csum fields.
Depending on the csum_flags settings some fields may have different usage
making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
stack) and inbound (up the stack) use. The CSUM flags used to be a bit
chaotic and rather poorly documented leading to incorrect use in many
places. Bring clarity into their use through better naming.
Compatibility mappings are provided to preserve the API. The drivers
can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by: The FreeBSD Foundation


# 254264 12-Aug-2013 jfv

Alter the mq_start routine to do a TRYLOCK and call to the locked routine
rather than just queueing. The former code was an attempt at getting
UDP performance up, but there have been customer reports of problems with it,
so the ixgbe approach seems the best solution for now.


# 254263 12-Aug-2013 scottl

Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI
command register. The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR. Thus, the bit is no longer
a reliable indication of capability, and should not be checked. This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.

This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.

Submitted by: jhb
Reviewed by: jfv, marius, achadd, achim
MFC after: 1 day


# 254262 12-Aug-2013 jfv

Improve the MSIX setup code in the drivers, thanks to Marius for
the changes. Make sure that pci_alloc_msix() does give us the vectors
we need and fall back to MSI when it doesn't, also release any that
were allocated when insufficient.

MFC after: 3 days


# 254008 06-Aug-2013 jfv

Make the various driver MSIX setup routines fallback to MSI more
gracefully. This change was suggested by Marius Strobl, thank you.

PR: kern/181016
MFC after: ASAP


# 254002 06-Aug-2013 jfv

When the igb driver is static there are cases when early interrupts occur,
resulting in a panic in refresh_mbufs, to prevent this add a check in the
interrupt handler for DRV_RUNNING.

MFC after: 1 day (critical for 9.2)


# 253303 12-Jul-2013 jfv

Change the E1000 driver option header handling to match the
ixgbe driver. As it was, when building them as a module INET
and INET6 are not defined. In these drivers it does not cause
a panic, however it does result in different behavior in the
ioctl routine when you are using a module vs static, and I
think the behavior should be the same.

MFC after: 3 days


# 250108 30-Apr-2013 luigi

use netmap_rx_irq() / netmap_tx_irq() to handle interrupts in
netmap mode, removing the logic from individual drivers.

(note: if_lem.c not updated yet due to some other pending modifications)


# 249339 10-Apr-2013 jfv

Simplify allocate_legacy code, txr pointer was breaking LEGACY compile,
thanks to Nick Rogers for pointing this out.


# 249074 03-Apr-2013 jfv

Correct the multicast handling in the E1000 drivers as was
done in ixgbe, thanks to Mike Karels for this fix. When exiting
promiscuous mode MPE bit was being unconditionally cleared, this
should not be done if we are in MAX multicast groups.


# 249070 03-Apr-2013 sbruno

Update man page for igb(4) with a little bit of information about
hw.igb.num_queues for those so inclined.

PR: kern/177384
Submitted by: hiren.panchasara@gmail.com
Reviewed by: sbruno@
Approved by: jfv@
Obtained from: Yahoo! Inc.
MFC after: 2 weeks


# 248906 29-Mar-2013 jfv

Change defines in the igb driver to allow an easier selection of
the older if_start/non-multiqueue interface from the stack. This
is not the default, but can be turned on in the Makefile now regardless
of the OS level to allow either testing or use of ALTQ.

MFC after: one week


# 247064 20-Feb-2013 jfv

Refresh on the shared code for the E1000 drivers.
- bear with me, there are lots of white space changes, I would not
do them, but I am a mere consumer of this stuff and if these drivers
are to stay in shape they need to be taken.

em driver changes: support for the new i217/i218 interfaces

igb driver changes:
- TX mq start has a quick turnaround to the stack
- Link/media handling improvement
- When link status changes happen the current flow control state
will now be displayed.
- A few white space/style changes.

lem driver changes:
- the shared code uncovered a bogus write to the RLPML register
(which does not exist in this hardware) in the vlan code,this
is removed.


# 246482 07-Feb-2013 rrs

This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by: jhb@freebsd.org, jlv@freebsd.org


# 246128 30-Jan-2013 sbz

Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays

Reviewed by: cognet
Approved by: cognet


# 245334 12-Jan-2013 smh

Fixed mbuf free when receive structures fail to allocate.

This prevents quad igb card on high core machines, without any nmbcluster or
igb queue tuning wedging the boot process if all nics are configured.

Reviewed by: jfv
Approved by: pjd (mentor)
MFC after: 1 week


# 243857 04-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.


# 243570 26-Nov-2012 glebius

drbr_enqueue() awlays consumes mbuf, no matter did it
fail or not. The mbuf pointer is no longer valid, so
can't be reused after.

Fix igb_mq_start() where mbuf pointer was used after
drbr_enqueue().

This eventually leads us to all invocations of
igb_mq_start_locked() called with third argument as NULL.
This allows us to simplify this function.

Submitted by: Karim Fodil-Lemelin <fodillemlinkarim gmail.com>
Reviewed by: jfv


# 241885 22-Oct-2012 eadler

This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)


# 241856 22-Oct-2012 eadler

Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by: des
Approved by: cperciva
MFC after: 1 week


# 241037 28-Sep-2012 glebius

The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.

Reviewed by: jfv, gnn


# 240879 23-Sep-2012 sbruno

This patch fixes a nit in the em, lem, and igb driver statistics. Increment
adapter->dropped_pkts instead of if_ierrors because if_ierrors is
overwritten by hw stats collection.

Submitted by: Andrew Boyer <aboyer@averesystems.com>
Reviewed by: Jack F Vogel <jfv@freebsd.org>
MFC after: 2 weeks


# 239109 06-Aug-2012 jfv

Make the polling interface in igb able to handle
multiqueue, and correct the rxdone handling. Update
the polling man page to include igb as well.

Thanks to Mark Johnston for these changes.


# 239105 06-Aug-2012 jfv

Correct the mq_start routine to avoid out-of-order
packet delivery, always enqueue when possible. Also
correct the DEPLETED test as multiple bits might be
set. Thanks to Randall Stewart for the changes!


# 238981 01-Aug-2012 sbruno

CPU_NEXT() already handles wrapping around to the beginning. Also, in a
system with sparse CPU IDs, you can have a valid CPU ID > mp_ncpus (e.g. if
you have two CPUs 0 and 4, with mp_maxid == 4 and mp_ncpus == 2).

Introduced at svn r235210

Submitted by: jhb@
Reviewed by: jfv@


# 238214 07-Jul-2012 jfv

Change the interface to the Energy Efficient Ethernet (EEE)
setting in the igb and em driver. This was necessitated by
a shared code change that I was given late in the game, a data
type changed from bool to int, in the last update I dealt with
it by a cast, but it was pointed out (thanks jhb) that there
was a potential problem with this. John suggested this safer
approach, and it is fine with me...

MFC after:2 days (to catch the 9.1 update)


# 238151 05-Jul-2012 jfv

Correct small regressions pointed out by jhb, thanks John.

MFC after:5 days


# 238148 05-Jul-2012 jfv

Sync with Intel internal source:
shared code update and small changes in core required
Add support for new i210/i211 devices
Improve queue calculation based on mac type

MFC after:5 days


# 236406 01-Jun-2012 jhb

Commit a portion of 233708 I missed earlier and don't include the
definition of igb_start() and igb_start_locked() (nor set if_start in
the ifnet) when igb(4) uses if_transmit.


# 235210 09-May-2012 sbruno

Modify the binding of queues to attach to as many CPUs
as possible when using more than one igb(4) adapter. This
means that queues will not be bound to the same CPUs if
there are more CPUs availble.

This is only applicable to a system that has multiple interfaces.

Obtained from: Yahoo! Inc.
MFC after: 3 days


# 234154 11-Apr-2012 jhb

Reapply r223198 which was reverted in the previous vendor import. Some
portions were already reapplied in r233708:
- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().

MFC after: 1 week


# 233708 30-Mar-2012 jhb

Fix a few issues with transmit handling in em(4) and igb(4):
- Do not define the foo_start() methods or set if_start in the ifnet if
multiq transmit is enabled. Also, set if_transmit and if_qflush before
ether_ifattach rather than after when multiq transmit is enabled. This
helps to ensure that the drivers never try to mix different transmit
methods.
- Properly restart transmit during resume. igb(4) was not restarting it
at all, and em(4) was restarting even if the link was down and was
calling the wrong method if multiq transmit was enabled.
- Remove all the 'more' handling for transmit completions. Transmit
completion processing does not have a processing limit, so it always
runs to completion and never has more work to do when it returns.
Instead, the previous code was returning 'true' anytime there were
packets in the queue that weren't still in the process of being
transmitted. The effect was that the driver would continuously
reschedule a task to process TX completions in effect running at 100%
CPU polling the hardware until it finished transmitting all of the
packets in the ring. Now it will just wait for the next TX completion
interrupt.
- Restart packet transmission when the link becomes active.
- Fix the MSI-X queue interrupt handlers to restart packet transmission if
there are pending packets in the relevant software queue (IFQ or buf_ring)
after processing TX completions. This is the root cause for the OACTIVE
hangs as if the MSI-X queue handler drained all the pending packets from
the TX ring, nothing would ever restart it. As such, remove some
previously-added workarounds to reschedule a task to poll the TX ring
anytime OACTIVE was set.

Tested by: sbruno
Reviewed by: jfv
MFC after: 1 week


# 232367 01-Mar-2012 jhb

Properly handle failures in igb_setup_msix() by returning 0 if MSI or MSI-X
allocation fails.

Reviewed by: jfv
MFC after: 2 weeks


# 232238 27-Feb-2012 luigi

A bunch of netmap fixes:

USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.

Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.


# 231796 15-Feb-2012 luigi

(This commit only touches code within the DEV_NETMAP blocks)

Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.


# 229939 10-Jan-2012 luigi

small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block


# 229767 07-Jan-2012 kevlo

ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it again

Reviewed by: yongari


# 228803 22-Dec-2011 luigi

put back netmap support, deleted by mistake in a previous commit


# 228788 21-Dec-2011 jhb

Restore the sysctl changes from 223676 and 227309 lost in the previous
import:
- Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
change a per-instance variable (which is used to control AIM) rather
than having all of the per-instance sysctls operate on a single global
variable.

While here, restore the previously existing hw.igb.rx_processing_limit
tunable as it is very useful to be able to set a default tunable that
applies to all adapters in the system.


# 228441 12-Dec-2011 mdf

Consistently use types in e1000 driver code:

- Two struct members eee_disable are used in a function that expects
an int *, so declare them int, not bool.
- igb_tx_ctx_setup() returns a boolean value, so declare it bool, not int.
- igb_header_split is passed to TUNABLE_INT, so delcare it int, not bool.
- igb_tso_setup() returns a bool, so declare it bool, not boolean_t.
- Do not re-define bool/true/false if the symbols already exist.

MFC after: 2 weeks
Sponsored by: Isilon Systems, LLC


# 228415 11-Dec-2011 jfv

Last change still had an issue, one more time...


# 228405 11-Dec-2011 jfv

Correct LINT build issues in the ioctl code.


# 228387 10-Dec-2011 jfv

Part 2 of 2 New deltas for the 1G drivers.

There have still been intermittent problems with apparent TX
hangs for some customers. These have been problematic to reproduce
but I believe these changes will address them. Testing on a number
of fronts have been positive.

EM: there is an important 'chicken bit' fix for 82574 in the shared
code this is supported in the core here.
- The TX path has been tightened up to improve performance. In
particular UDP with jumbo frames was having problems, and the
changes here have improved that.
- OACTIVE has been used more carefully on the theory that some
hangs may be due to a problem in this interaction
- Problems with the RX init code, the "lazy" allocation and
ring initialization has been found to cause problems in some
newer client systems, and as it really is not that big a win
(its not in a hot path) it seems best to remove it.
- HWTSO was broken when VLAN HWTAGGING or HWFILTER is used, I
found this was due to an error in setting up the descriptors
in em_xmit.

IGB:
- TX is also improved here. With multiqueue I realized its very
important to handle OACTIVE only under the CORE lock so there
are no races between the queues.
- Flow Control handling was broken in a couple ways, I have changed
and I hope improved that in this delta.
- UDP also had a problem in the TX path here, it was change to
improve that.
- On some hardware, with the driver static, a weird stray interrupt
seems to sometimes fire and cause a panic in the RX mbuf refresh
code. This is addressed by setting interrupts late in the init
path, and also to set all interrupts bits off at the start of that.


# 228281 05-Dec-2011 luigi

add netmap support for "em", "lem", "igb" and "re".

On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.

On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.

"igb" is untested as i don't have the hardware.


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 223831 06-Jul-2011 jfv

A fix to make the LINT-NOINET build happy, if this
works out the ixgbe driver should be changed as well.


# 223676 29-Jun-2011 jhb

- Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
change a per-instance variable (which is used to control AIM) rather
than having all of the per-instance sysctls operate on a single global
variable.

Reviewed by: jfv (earlier version)
MFC after: 1 week


# 223482 23-Jun-2011 jfv

Put back the global for rx processing due to popular demand.


# 223350 20-Jun-2011 jfv

Eliminate some global tuneables in favor of adapter-specific,
particular flow control and dma coalesce. Also improve the
sysctl operation on those too.

Add IPv6 detection in the ioctl code, this was done for
ixgbe first, carrying that over.

Add resource ability to disable particular adapter.

Add HW TSO capability so vlans can make use of TSO


# 223198 17-Jun-2011 jhb

- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Don't define igb_start() at all on 8.0 and where if_transmit is used.
Replace last remaining call to igb_start() with a loop to kick off
transmit on each queue instead.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().

Reviewed by: jfv
MFC after: 1 week


# 220375 05-Apr-2011 jfv

Important update for the igb driver:
- Add the change made in em to the actual unrefreshed number
of descriptors is used as a basis in rxeof on the way out
to determine if more refresh is needed. NOTE: there is a
difference in the ring setup in igb, this is not accidental,
it is necessitated by hardware behavior, when you reset the
newer adapters it will not let you write RDH, it ALWAYS sets
it to 0. Thus the way em does it is not possible.
- Change the sysctl handling of flow control, it will now make
the change dynamically when the variable setting changes rather
than requiring a reset.
- Change the eee sysctl naming, validation found the old unintuitive :)
- Last but not least, some important performance tweaks in the TX
path, I found that UDP behavior could be drastically hindered or
improved with just small changes in the start loop. What I have
here is what testing has shown to be the best overall. Its interesting
to note that changing the clean threshold to start at a full half of
the ring, made a BIG difference in performance. I hope that this
will prove to be advantageous for most workloads.

MFC in a week.


# 219753 18-Mar-2011 jfv

This delta updates the em driver to version 7.2.2 which has
been undergoing test for some weeks. This improves the RX
mbuf handling to avoid system hang due to depletion. Thanks
to all those who have been testing the code, and to Beezar
Liu for the design changes.

Next the igb driver is updated for similar RX changes, but
also to add new features support for our upcoming i350 family
of adapters.

MFC after a week


# 218583 11-Feb-2011 jfv

Somehow the RX ring depletion fix got partially removed,
replace the missing pieces.


# 218530 10-Feb-2011 jfv

Add support for the new I350 family of 1G interfaces.
- this also includes virtualization support on these devices

Correct some vlan issues we were seeing in test, jumbo frames on vlans
did not work correctly, this was all due to confused logic around HW
filters, the new code should now work for all uses.

Important fix: when mbuf resources are depeleted, it was possible to
completely empty the RX ring, and then the RX engine would stall
forever. This is fixed by a flag being set whenever the refresh code
fails due to an mbuf shortage, also the local timer now makes sure
that all queues get an interrupt when it runs, the interrupt code
will then always call rxeof, and in that routine the first thing done
is now to check the refresh flag and call refresh_mbufs. This has been
verified to fix this type 'hang'. Similar code will follow in the other
drivers.

Finally, sync up shared code for the I350 support.

Thanks to everyone that has been reporting issues, and helping in the
debug/test process!!


# 217556 18-Jan-2011 mdf

Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.


# 217318 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the Intel drivers.


# 216173 04-Dec-2010 jfv

Remove the bogus test in the TX context setup for IPV6,
the size can be smaller than the constant when you are
doing HW TAGGING, and you still need to process this
packet in a normal way. I'm not sure where the notion
to just return came from, but its wrong.

MFC after: 3 days


# 215781 23-Nov-2010 jfv

- New 82580 devices supported
- Fixes from John Baldwin: vlan shadow tables made per/interface,
make vlan hw setup only happen when capability enabled, and
finally, make a tuneable interrupt rate. Thanks John!
- Tweaked watchdog handling to avoid any false positives, now
detection is in the TX clean path, with only the final check
and init happening in the local timer.
- limit queues to 8 for all devices, with 82576 or 82580 on
larger machines it can get greater than this, and it seems
mostly a resource waste to do so. Even 8 might be high but
it can be manually reduced.
- use 2k, 4k and now 9k clusters based on the MTU size.
- rework the igb_refresh_mbuf() code, its important to
make sure the descriptor is rewritten even when reusing
mbufs since writeback clobbers things.

MFC: in a few days, this delta needs to get to 8.2


# 213234 27-Sep-2010 jfv

Update code from Intel:
- Sync shared code with Intel internal
- New client chipset support added
- em driver - fixes to 82574, limit queues to 1 but use MSIX
- em driver - large changes in TX checksum offload and tso
code, thanks to yongari.
- some small changes for watchdog issues.
- igb driver - local timer watchdog code was missing locking
this and a couple other watchdog related fixes.
- bug in rx discard found by Andrew Boyer, check for null pointer

MFC: a week


# 212902 20-Sep-2010 jhb

Tweak the stats exported by the e1000 drivers:
- Add a single sysctl procedure to all three drivers to read an arbitrary
register (the register is passed as arg2). Use it to replace existing
routines in igb(4) that used a separate routine for each register, and
to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
'host to card' sysctl stats from em(4) as they are not implemented in
any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).

Reviewed by: jfv
MFC after: 1 week


# 211913 27-Aug-2010 yongari

Do not allocate multicast array memory in multicast filter
configuration function. For failed memory allocations, em(4)/lem(4)
called panic(9) which is not acceptable on production box.
igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which
consumed 768 bytes of stack memory which looks too big.

To address these issues, allocate multicast array memory in device
attach time and make multicast configuration success under any
conditions. This change also removes the excessive use of memory in
stack.

Reviewed by: jfv


# 211907 27-Aug-2010 yongari

Do not call voluntary panic(9) in case of if_alloc() failure.

Reviewed by: jfv


# 211906 27-Aug-2010 yongari

Make sure not to access unallocated stats memory.

Reviewed by: jfv


# 211516 19-Aug-2010 jfv

Eliminate the ambiguous queue setting logic for
the VF, it made it possible to have 2 queues which
we don't want, the HOST is unable to handle it.


# 210968 06-Aug-2010 jfv

Put the early setting of the MAC type back, its
removal resulted in broken code in MSIX setup.


# 210452 24-Jul-2010 gnn

style(9) fix

MFC after: 1 week


# 210428 23-Jul-2010 gnn

Fix a bug in the statistics code for tracking the head and
tail pointers of the tx and rx queues. We needed a SYSCTL_PROC
to correctly get the values at run time.

Submitted by: Andrew Boyer aboyer at averesystems.com
MFC after: 1 week


# 209859 09-Jul-2010 jfv

Fix of a VLAN problem by jhb, the checksum capability
got lost along the way.

MFC: asap


# 209616 30-Jun-2010 jfv

OK, I was a bit sleep this morning and checked in
the core changes but left out the shared code, lol.
Well, and a couple fixes to the core... hopefully
this will all be complete now.

Happy happy joy joy :)


# 209611 30-Jun-2010 jfv

SR-IOV support added to igb

What this provides is support for the 'virtual function'
interface that a FreeBSD VM may be assigned from a host
like KVM on Linux, or newer versions of Xen with such
support.

When the guest is set up with the capability, a special
limited function 82576 PCI device is present in its virtual
PCI space, so with this driver installed in the guest that
device will be detected and function nearly like the bare
metal, as it were.

The interface is only allowed a single queue in this configuration
however initial performance tests have looked very good.

Enjoy!!


# 209512 24-Jun-2010 gnn

Make sure that all the exposed counters and variables are actually
being updated.

Pointed out by: jfv


# 209241 16-Jun-2010 gnn

Move statistics into the sysctl tree making it easier to find
and use them.
Add previously hidden statistics, some of which include interrupt
and host/card communication counters.


# 209238 16-Jun-2010 jfv

Changes from John Baldwin adding to last commit,
change rxeof api for poll friendliness, and
eliminate unnecessary link tasklet use. Thanks John!


# 209218 15-Jun-2010 jfv

Change to have legacy interrupts use the same
handler had a flaw, thanks to John Baldwin for
finding it. Change which queue legacy tasks are
enqueued on.

MFC: soonest


# 209071 11-Jun-2010 jfv

Put back the lost bus_describe_intr() calls.


# 209068 11-Jun-2010 jfv

Add a couple fixes from Michael Tuexen.
Remove unneeded rxtx handler, make que handler generic.
Do not allocate header mbufs in rx ring if not doing hdr split.
Release the lock in rxeof call to stack.

MFC for 8.1 asap


# 208362 20-May-2010 jhb

Restore part of 200671 which was lost in previous driver changes:
- Add interrupt descriptions when using mulitple MSI-X interrupts.


# 208103 14-May-2010 jfv

Small changes preparing for MFC, need to conditionalize
the buf_ring_free call, and lem is missing the WOL change
put into em.


# 206629 14-Apr-2010 jfv

Add a missing fragment in the tx msix handler to invoke
another if all work is not done.

Sync the igb driver with changes suggested by yongari and
made in em, these made sense to be in both drivers.


# 206431 09-Apr-2010 jfv

DUH, must be tired, I missed the second instance...
time for the weekend :)


# 206430 09-Apr-2010 jfv

Thanks to Michael Tuexen for catching this, bit set that
keeps the clock from being reset when writing to EITR was
incorrect, also there is a shared code #define for it anyway.


# 206388 07-Apr-2010 jfv

Important fix got clobbered in the em driver, keeping
VLAN HWFILTER from being used by default, this breaks
stacked pseudo devices, and as it turns out, also breaks
virtual machines that happen to use VLANS (didn't know that
before :). Put the fix back into the em driver, and for good
measure add the same code to the igb driver where it should
have been anyway.


# 206023 31-Mar-2010 jfv

The POLL code was missed in the queue conversion,
change the argument type to igb_rxeof() to the
correct type. Note, any users of POLLING must
be sure and set the number of queues to 1 for
things to work correctly.


# 206001 31-Mar-2010 marius

Hook the identification LEDs of igb(4), lem(4) and em(4) devices up with
led(4) so they can be lit or f.e. made blink via `echo f2 > /dev/led/em0`
for localization purposes.

Approved by: jfv
MFC afer: 1 week (after r205869)


# 205869 29-Mar-2010 jfv

Update to igb and em:

em revision 7.0.0:
- Using driver devclass, seperate legacy (pre-pcie) code
into a seperate source file. This will at least help
protect against regression issues. It compiles along
with em, and is transparent to end use, devices in each
appear to be 'emX'. When using em in a modular form this
also allows the legacy stuff to be defined out.
- Add tx and rx rings as in igb, in the 82574 this becomes
actual multiqueue for the first time (2 queues) while in
other PCIE adapters its just make code cleaner.
- Add RX mbuf handling logic that matches igb, this will
eliminate packet drops due to temporary mbuf shortage.

igb revision 1.9.3:
- Following the ixgbe code, use a new approach in what
was called 'get_buf', the routine now has been made
independent of rxeof, it now does the update to the
engine TDT register, this design allows temporary
mbuf resources to become non-critical, not requiring
a packet to be discarded, instead it just returns and
does not increment the tail pointer.
- With the above change it was also unnecessary to keep
'spare' maps around, since we do not have the discard
issue.
- Performance tweaks and improvements to the code also.

MFC in a week


# 203834 13-Feb-2010 mlaier

Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq

Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks


# 203354 01-Feb-2010 jfv

A few minor changes: add altq option header, add missing conditional
around a buf_ring call that will break 7.3, and thanks to Fabien Thomas
add POLLING support for igb and a minor related fix in the em driver.


# 203090 27-Jan-2010 jfv

Add a link tasklet so updates can be sleepable.


# 203051 26-Jan-2010 jfv

Missing a fix for the new watchdog handling.


# 203049 26-Jan-2010 jfv

Update the 1G drivers, shared code sync with Intel,
igb now has a queue notion that has a single interrupt
with an RX/TX pair, this will reduce the total interrupts
seen on a system. Both em and igb have a new watchdog
method. igb has fixes from Pyun Yong-Hyeon that have
improved stability, thank you :)

I wish to MFC this for 7.3 asap, please test if able.


# 200671 18-Dec-2009 jhb

- Add missing newlines to some error messages.
- Add interrupt descriptions when using mulitple MSI-X interrupts.

Reviewed by: jfv


# 200268 08-Dec-2009 jfv

Remove phantom line of code that somehow slipped
into the checkin.


# 200243 07-Dec-2009 jfv

Resync with Intel versions of both the em and igb
drivers. These add new hardware support, most importantly
the pch (i5 chipset) in the em driver. Also, both drivers
now have the simplified (and I hope improved) watchdog
code. The igb driver uses the new RX cleanup that I
first implemented in ixgbe.

em - version 6.9.24
igb - version 1.8.4


# 199192 11-Nov-2009 jfv

With an i386 kernel the igb driver can cause a
page fault panic on initialization due to a large
number of bounce pages being allocated. This is due
to the dma tag requiring page alignment on mbuf mapping.
This was removed some time back from the ixgbe driver
and is not needed here either.


# 197079 10-Sep-2009 jfv

Fix build issue with last commit.


# 197074 10-Sep-2009 jfv

Fix an xmit mbuf leak, when transmit failed but you
still have an mbuf it was not being requeued.

MFC: 3 days


# 196386 19-Aug-2009 delphij

Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.

A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.

Reviewed by: jfv, yongari
Approved by: re (kib)


# 195857 24-Jul-2009 jfv

Improvement on the last change, this gives a precise
way to tell the one and only interface that a vlan
event is for. Thanks to John Baldwin for the patch.

Approved by: re


# 195851 24-Jul-2009 jfv

This delta fixes two bugs:
- When a vlan event occurs a check was not made that
the event was actually for the interface, thus resulting
in a panic. All three drivers have this vulnerability. Add
a check for this condition.
- Secondly, there was a duplicate buf_ring free in the em
driver resulting in a panic on unload. Remove.

Approved by: re


# 195691 14-Jul-2009 bz

Re-add opt_inet.h, as we did in r193862 and lost yet again.

Approved by: re (kib)


# 195049 26-Jun-2009 rwatson

Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by: re (kib)
MFC after: 6 weeks


# 194979 25-Jun-2009 jfv

Change intr_bind to bus_bind_intr, also limit this to
multiqueue setup which is not the shipping default for
igb (its set to 1).


# 194925 24-Jun-2009 jfv

need to make intr_bind call architecture specific for
global builds (failing sun4v lint build)


# 194911 24-Jun-2009 jfv

Fix lint issue.


# 194865 24-Jun-2009 jfv

Updates for both the em and igb drivers, add support
for multiqueue tx, shared code updates, new device
support, and some bug fixes.


# 193862 09-Jun-2009 bz

Add opt_inet.h back lost with r190872.
This will bring back improved IPv4 SIOCSIFADDR ioctl handling
not re-initializing the interface if avoidable.


# 191569 27-Apr-2009 jfv

igb_txeof also has a case where the watchdog may not
get reset when it should be

MFC after: 2 weeks


# 191065 14-Apr-2009 jfv

Thanks to Michael Tuexen and Randall Scott for providing a
few important bug fixes found while they were doing SCTP
development, and that I somehow lost during the scramble.

Thanks guys!!


# 190879 10-Apr-2009 jfv

Fix build problem with data format.


# 190872 09-Apr-2009 jfv

This delta syncs the em and igb drivers with Intel,
adds header split and SCTP support into the igb driver.
Various small improvements and fixes.

MFC after: 2 weeks


# 187123 12-Jan-2009 gnn

Fix a cut/paste bug which prevents us from setting the average
latency tunable.

Reviewed by: jfv
MFC after: 1 day


# 185355 27-Nov-2008 jfv

Thanks to the reminder from Ganbold, small fix in the RX failure
path for an infinite loop. Problem originally noticed in ixgbe
by Jeff Roberson and fixed there. Thanks to everyone involved.


# 185353 26-Nov-2008 jfv

This delta is primarily a fix for es2lan devices that
will sometimes fail to initialize problem due to a lock
contention with management hardware. However, in order to
deliver that fix it was necessary to take a shared code
update as a whole, and this required scattered changes in
the core code to be compatible.

The em driver now has VLAN HW support added as the igb
driver had previously.

MFC after: ASAP - in time for 7.1 RELEASE


# 184718 06-Nov-2008 bz

Hide AF_INET specific ioctl handling under #ifdef INET.

MFC after: 2 months


# 183714 09-Oct-2008 peter

Clean out some empty mergeinfo records, presumably by people doing local
cp/mv operations. The full repo-relative URL should be specified for the
source in these cases.


# 182416 28-Aug-2008 jfv

Update to igb driver:

- changes in support of the VLAN filter fix to 126850
- removal of a bunch of legacy code that was cruft, if not
possibly harmful.
- removal of POLLING from this driver, with multiqueue and
MSIX it just makes no sense here.
- Fix an LRO bug that I've been working on internally, intermittent
panics under stress, the problem was releasing the RX ring lock
before the LRO flushing.
- Following the above fix I now enable LRO by default
- For performance reasons increase the default number of RX queues
to 4.
- Add AIM - "Adaptive Interrupt Moderation", a fancy way of saying
that the EITR value is dynamically changed based on the size of
packets in the last interrupt interval.

- Much goodness to try, enjoy!!


# 181041 31-Jul-2008 jfv

Data type fix


# 181035 30-Jul-2008 ps

Include netinet/tcp_lro.h, unbreak the build


# 181027 30-Jul-2008 jfv

Merge of the source for igb and em into dev/e1000, this
proved to be necessary to make the static drivers work
in EITHER/OR or BOTH configurations. Modules will still
build in sys/modules/igb or em as before.

This also updates the igb driver for support for the 82576
adapter, adds shared code fixes, and etc....

MFC after: ASAP


# 178531 26-Apr-2008 jfv

Opps,missed line in the fix...


# 178524 25-Apr-2008 jfv

A change got dropped in the merge, add back


# 178523 25-Apr-2008 jfv

This delta has a few important items:

PR 122839 is fixed in both em and in igb

Second, the issue on building modules since the static kernel
build changes is now resolved. I was not able to get the fancier
directory hierarchy working, but this works, both em and igb
build as modules now.

Third, there is now support in em for two new NICs, Hartwell
(or 82574) is a low cost PCIE dual port adapter that has MSIX,
for this release it uses 3 vectors only, RX, TX, and LINK. In
the next release I will add a second TX and RX queue. Also, there
is support here for ICH10, the followon to ICH9. Both of these are
early releases, general availability will follow soon.

Fourth: On Hartwell and ICH10 we now have IEEE 1588 PTP support,
I have implemented this in a provisional way so that early adopters
may try and comment on the functionality. The IOCTL structure may
change. This feature is off by default, you need to edit the Makefile
and add the EM_TIMESYNC define to get the code.

Enjoy all!!


# 177878 03-Apr-2008 jfv

Another build fix


# 177877 03-Apr-2008 jfv

Fix a lint issue in the build.


# 177868 02-Apr-2008 jfv

Fix minor bug in last checkin, NO_STRICT_ALIGNMENT code.


# 177867 02-Apr-2008 jfv

This update primarily addresses the ability to have both the em
and the igb driver static in the kernel. But it also reflects
some other bug fixes in my development stream at Intel.
PR 122373 is also fixed in this code.


# 176685 01-Mar-2008 jfv

Change data formating in debug code.


# 176684 01-Mar-2008 jfv

An error in the poll routine turned up during LINT build


# 176682 01-Mar-2008 jfv

Missing braces in link routine.


# 176679 01-Mar-2008 jfv

Missed some code that is ifdef STRICT_ALIGN :(


# 176667 29-Feb-2008 jfv

This change introduces a split to the Intel E1000 driver, now rather than
just em, there is an igb driver (this follows behavior with our Linux drivers).
All adapters up to the 82575 are supported in em, and new client/desktop support
will continue to be in that adapter.

The igb driver is for new server NICs like the 82575 and its followons.
Advanced features for virtualization and performance will be in this driver.

Also, both drivers now have shared code that is up to the latest we have
released. Some stylistic changes as well.

Enjoy :)