#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
250665 |
|
15-May-2013 |
adrian |
Implement my first cut at "correct" node power-save and PS-POLL support.
This implements PS-POLL awareness i nthe
* Implement frame "leaking", which allows for a software queue to be scheduled even though it's asleep * Track whether a frame has been leaked or not * Leak out a single non-AMPDU frame when transmitting aggregates * Queue BAR frames if the node is asleep * Direct-dispatch the rest of control and management frames. This allows for things like re-association to occur (which involves sending probe req/resp as well as assoc request/response) when the node is asleep and then tries reassociating. * Limit how many frames can set in the software node queue whilst the node is asleep. net80211 is already buffering frames for us so this is mostly just paranoia. * Add a PS-POLL method which leaks out a frame if there's something in the software queue, else it calls net80211's ps-poll routine. Since the ath PS-POLL routine marks the node as having a single frame to leak, either a software queued frame would leak, OR the next queued frame would leak. The next queued frame could be something from the net80211 power save queue, OR it could be a NULL frame from net80211.
TODO:
* Don't transmit further BAR frames (eg via a timeout) if the node is currently asleep. Otherwise we may end up exhausting management frames due to the lots of queued BAR frames.
I may just undo this bit later on and direct-dispatch BAR frames even if the node is asleep.
* It would be nice to burst out a single A-MPDU frame if both ends support this. I may end adding a FreeBSD IE soon to negotiate this power save behaviour.
* I should make STAs timeout of power save mode if they've been in power save for more than a handful of seconds. This way cards that get "stuck" in power save mode don't stay there for the "inactivity" timeout in net80211.
* Move the queue depth check into the driver layer (ath_start / ath_transmit) rather than doing it in the TX path.
* There could be some naughty corner cases with ps-poll leaking. Specifically, if net80211 generates a NULL data frame whilst another transmitter sends a normal data frame out net80211 output / transmit, we need to ensure that the NULL data frame goes out first. This is one of those things that should occur inside the VAP/ic TX lock. Grr, more investigations to do..
Tested:
* STA: AR5416, AR9280 * AP: AR5416, AR9280, AR9160
|
#
249578 |
|
17-Apr-2013 |
adrian |
Update the rate series setup code to use the decisions already made in ath_tx_rate_fill_rcflags(). Include setting up the TX power cap in the rate scenario setup code being passed to the HAL.
Other things:
* add a tx power cap field in ath_rc. * Add a three-stream flag in ath_rc. * Delete the LDPC flag from ath_rc - it's not a per-rate flag, it's a global flag for the transmission.
|
#
247507 |
|
28-Feb-2013 |
adrian |
Oops - fix an incorrect test.
|
#
247506 |
|
28-Feb-2013 |
adrian |
Don't enable the HT flags for legacy rates.
I stumbled across this whilst trying to debug another weird hang reported on the freebsd-wireless list.
Whilst here, add in the STBC check to ath_rateseries_setup().
Whilst here, fix the short preamble flag to be set only for legacy rates.
Whilst here, comment that we should be using the full set of decisions made by ath_rateseries_setup() rather than recalculating them!
|
#
247368 |
|
26-Feb-2013 |
adrian |
Enable STBC for the given rate series if it's negotiated:
* If both ends have negotiated (at least) one stream; * Only if it's a single stream rate (MCS0-7); * Only if there's more than one TX chain enabled.
Tested:
* AR9280 STA mode -> Atheros AP; tested both MCS2 (STBC) and MCS12 (no STBC.) Verified using athalq to inspect the TX descriptors.
TODO:
* Test AR5416 - no STBC should be enabled; * Test AR9280 with one TX chain enabled - no STBC should be enabled.
|
#
247287 |
|
25-Feb-2013 |
adrian |
Part #2 of the TX chainmask changes:
* Remove ar5416UpdateChainmasks(); * Remove the TX chainmask override code from the ar5416 TX descriptor setup routines; * Write a driver method to calculate the current chainmask based on the operating mode and update the driver state; * Call the HAL chainmask method before calling ath_hal_reset(); * Use the currently configured chainmask in the TX descriptors rather than the hardware TX chainmasks.
Tested:
* AR5416, STA/AP mode - legacy and 11n modes
|
#
247087 |
|
21-Feb-2013 |
adrian |
Add an option to allow the minimum number of delimiters to be tweaked.
This is primarily for debugging purposes.
Tested:
* AR5416, STA mode
|
#
247085 |
|
21-Feb-2013 |
adrian |
Add a new option to limit the maximum size of aggregates. The default is to limit them to what the hardware is capable of.
Add sysctl twiddles for both the non-RTS and RTS protected aggregate generation.
Whilst here, add some comments about stuff that I've discovered during my exploration of the TX aggregate / delimiter setup path from the reference driver.
|
#
246536 |
|
08-Feb-2013 |
adrian |
Fix a corner case that I noticed with the AR5416 (and it's currently crappy 802.11n performance, sigh.)
With the AR5416, aggregates need to be limited to 8KiB if RTS/CTS is enabled. However, larger aggregates were going out with RTSCTS enabled. The following was going on:
* The first buffer in the list would have RTS/CTS enabled in bf->bf_state.txflags; * The aggregate would be formed; * The "copy over the txflags from the first buffer" logic that I added blanked the RTS/CTS TX flags fields, and then copied the bf_first RTS/CTS flags over; * .. but that'd cause bf_first to be blanked out! And thus the flag was cleared; * So the rest of the aggregate formation would run with those flags cleared, and thus > 8KiB aggregates were formed.
The driver is now (again) correctly limiting aggregate formation for the AR5416 but there are still other pending issues to resolve.
Tested:
* AR5416, STA mode
|
#
243786 |
|
02-Dec-2012 |
adrian |
Delete the per-TXQ locks and replace them with a single TX lock.
I couldn't think of a way to maintain the hardware TXQ locks _and_ layer on top of that per-TXQ software queuing and any other kind of fine-grained locks (eg per-TID, or per-node locks.)
So for now, to facilitate some further code refactoring and development as part of the final push to get software queue ps-poll and u-apsd handling into this driver, just do away with them entirely.
I may eventually bring them back at some point, when it looks slightly more architectually cleaner to do so. But as it stands at the present, it's not really buying us much:
* in order to properly serialise things and not get bitten by scheduling and locking interactions with things higher up in the stack, we need to wrap the whole TX path in a long held lock. Otherwise we can end up being pre-empted during frame handling, resulting in some out of order frame handling between sequence number allocation and encryption handling (ie, the seqno and the CCMP IV get out of sequence);
* .. so whilst that's the case, holding the lock for that long means that we're acquiring and releasing the TXQ lock _inside_ that context;
* And we also acquire it per-frame during frame completion, but we currently can't hold the lock for the duration of the TX completion as we need to call net80211 layer things with the locks _unheld_ to avoid LOR.
* .. the other places were grab that lock are reset/flush, which don't happen often.
My eventual aim is to change the TX path so all rejected frame transmissions and all frame completions result in any ieee80211_free_node() calls to occur outside of the TX lock; then I can cut back on the amount of locking that goes on here.
There may be some LORs that occur when ieee80211_free_node() is called when the TX queue path fails; I'll begin to address these in follow-up commits.
|
#
242528 |
|
03-Nov-2012 |
adrian |
For AR9380 NICs - the non-enterprise versions don't support RTS protection of small (< 256 byte) aggregate frames.
This needs to be done or 11n aggregation TX just simply doesn't work on these NICs.
Whilst here, extend some debug printing; I was using this whilst debugging the TX power setup in the TX descriptor(s) on the AR9380.
|
#
241336 |
|
07-Oct-2012 |
adrian |
Migrate the TID TXQ accesses to a new set of macros, rather than reusing the ATH_TXQ_* macros.
* Introduce the new macros; * rename the TID queue and TID filtered frame queue so the compiler tells me I'm using the wrong macro.
These should correspond 1:1 to the existing code.
|
#
240226 |
|
08-Sep-2012 |
adrian |
Correctly mask out the RTS/CTS flags when forming aggregates.
This had the side effect of clearing HAL_TXDESC_CLRDMASK for a bunch of frames, meaning they'd end up being potentially filtered if there were an error. This is fine in the previous world as they'd just be software retried but now that I'm working on filtered frames, these descriptors would be endlessly retried until another valid frame would come along that had CLRDMASK set.
|
#
238949 |
|
31-Jul-2012 |
adrian |
Shuffle the call to ath_hal_setuplasttxdesc() to _after_ the rate control code is called and remove it from ath_buf_set_rate().
For the legacy (non-11n API) TX routines, ath_hal_filltxdesc() takes care of setting up the intermediary and final descriptors right, complete with copying the rate control info into the final descriptor so the rate modules can grab it.
The 11n version doesn't do this - ath_hal_chaintxdesc() doesn't copy the rate control bits over, nor does it clear isaggr/moreaggr/ pad delimiters. So the call to setuplasttxdesc() is needed here.
So:
* legacy NICs - never call the 11n rate control stuff, so filltxdesc copies the rate control info right; * 11n NICs transmitting legacy or 11n non-aggregate frames - ath_hal_set11nratescenario() is called to setup rate control and then ath_hal_filltxdesc() chains them together - so the rate control info is right; * 11n aggregate frames - set11nratescenario() is called, then ath_hal_chaintxdesc() is called to chain a list of aggregate and subframes together. This requires a call to ath_hal_setuplasttxdesc() to complete things.
Tested:
* AR9280 in station mode
TODO:
* I really should make sure that the descriptor contents get blanked out correctly or garbage left over from aggregate frames may show up in non-aggregate frames, leading to badness.
|
#
238839 |
|
27-Jul-2012 |
adrian |
Introduce a couple more fields in the rate scenario setup as part of (future) TPC support in the AR9300 HAL.
This is effectively a no-op for the moment as (a) TPC isn't really supported, (b) the AR9300 HAL isn't yet public, and (c) the existing HAL code doesn't use these fields.
Obtained from: Qualcomm Atheros
|
#
238711 |
|
23-Jul-2012 |
adrian |
Revert this; it wasn't supposed to be part of this commit.
|
#
238710 |
|
23-Jul-2012 |
adrian |
Begin separating out the TX DMA setup in preparation for TX EDMA support.
* Introduce TX DMA setup/teardown methods, mirroring what's done in the RX path.
Although the TX DMA descriptor is setup via ath_desc_alloc() / ath_desc_free(), there TX status descriptor ring will be allocated in this path.
* Remove some of the TX EDMA capability probing from the RX path and push it into the new TX EDMA path.
|
#
237171 |
|
16-Jun-2012 |
adrian |
A few nitpicks:
* Use ATH_RC_NUM instead of '4' when iterating over the ratecontrol series array.
* A few style(9) fixes, hopefully no regressions here.
* Add some comments that better describe what's going on.
|
#
236995 |
|
13-Jun-2012 |
adrian |
Remove a duplicate definition.
|
#
236872 |
|
11-Jun-2012 |
adrian |
Revert r233227 and followup commits as it breaks CCMP PN replay detection.
This showed up when doing heavy UDP throughput on SMP machines.
The problem with this is because the 802.11 sequence number is being allocated separately to the CCMP PN replay number (which is assigned during ieee80211_crypto_encap()).
Under significant throughput (200+ MBps) the TX path would be stressed enough that frame TX/retry would force sequence number and PN allocation to be out of order. So once the frames were reordered via 802.11 seqnos, the CCMP PN would be far out of order, causing most frames to be discarded by the receiver.
I've fixed this in some local work by being forced to:
(a) deal with the issues that lead to the parallel TX causing out of order sequence numbers in the first place; (b) fix all the packet queuing issues which lead to strange (but mostly valid) TX.
I'll begin fixing these in a subsequent commit or five.
PR: kern/166190
|
#
233988 |
|
07-Apr-2012 |
adrian |
As I thought, this is a bad idea. When forming aggregates, the RTS/CTS stuff and rate control lookup is only done on the first frame.
|
#
233970 |
|
07-Apr-2012 |
adrian |
Enforce the RTS aggregation limit if RTS/CTS protection is enabled; if any subframes in an aggregate have different protection from the first frame in the formed aggregate, don't add that frame to the aggregate.
This is likely a suboptimal method (I think we'll mostly be OK marking frames that have seqno's with the same protection as normal data frames) but I'll just be cautious for now.
|
#
233966 |
|
07-Apr-2012 |
adrian |
Remove duplicate txflags field from ath_buf.
rename bf_state.bfs_flags to bf_state.bfs_txflags, as that is what it effectively is.
|
#
233514 |
|
26-Mar-2012 |
adrian |
Use the assigned sequence number when checking if a retried packet is within the BAW.
This regression was introduced in ane earlier commit by me to fix the BAW seqno allocation-but-not-insertion-into-BAW race. Since it was only ever using the to-be allocated sequence number, any frame retries with the first frame in the BAW still in the software queue would have constantly failed, as ni_txseqs[tid] would always be outside the BAW.
TODO:
* Extract out the mostly common code here in the agg and non-agg ADDBA case and stuff it into a single function.
PR: kern/166357
|
#
233227 |
|
20-Mar-2012 |
adrian |
Delay sequence number allocation for A-MPDU until just before the frame is queued to the hardware.
Because multiple concurrent paths can execute ath_start(), multiple concurrent paths can push frames into the software/hardware TX queue and since preemption/interrupting can occur, there's the possibility that a gap in time will occur between allocating the sequence number and queuing it to the hardware.
Because of this, it's possible that a thread will have allocated a sequence number and then be preempted by another thread doing the same. If the second thread sneaks the frame into the BAW, the (earlier) sequence number of the first frame will be now outside the BAW and will result in the frame being constantly re-added to the tail of the queue. There it will live until the sequence numbers cycle around again.
This also creates a hole in the RX BAW tracking which can also cause issues.
This patch delays the sequence number allocation to occur only just before the frame is going to be added to the BAW. I've been wanting to do this anyway as part of a general code tidyup but I've not gotten around to it. This fixes the PR.
However, it still makes it quite difficult to try and ensure in-order queuing and dequeuing of frames. Since multiple copies of ath_start() can be run at the same time (eg one TXing process thread, one TX completion task/one RX task) the driver may end up having frames dequeued and pushed into the hardware slightly/occasionally out of order.
And, to make matters more annoying, net80211 may have the same behaviour - in the non-aggregation case, the TX code allocates sequence numbers before it's thrown to the driver. I'll open another PR to investigate this and potentially introduce some kind of final-pass TX serialisation before frames are thrown to the hardware. It's also very likely worthwhile adding some debugging code into ath(4) and net80211 to catch when/if this does occur.
PR: kern/166190
|
#
227364 |
|
08-Nov-2011 |
adrian |
Introduce TX aggregation and software TX queue management for Atheros AR5416 and later wireless devices.
This is a very large commit - the complete history can be found in the user/adrian/if_ath_tx branch.
Legacy (ie, pre-AR5416) devices also use the per-software TXQ support and (in theory) can support non-aggregation ADDBA sessions. However, the net80211 stack doesn't currently support this.
In summary:
TX path:
* queued frames normally go onto a per-TID, per-node queue * some special frames (eg ADDBA control frames) are thrown directly onto the relevant hardware queue so they can go out before any software queued frames are queued. * Add methods to create, suspend, resume and tear down an aggregation session. * Add in software retransmission of both normal and aggregate frames. * Add in completion handling of aggregate frames, including parsing the block ack bitmap provided by the hardware. * Write an aggregation function which can assemble frames into an aggregate based on the selected rate control and channel configuration. * The per-TID queues are locked based on their target hardware TX queue. This matches what ath9k/atheros does, and thus simplified porting over some of the aggregation logic. * When doing TX aggregation, stick the sequence number allocation in the TX path rather than net80211 TX path, and protect it by the TXQ lock.
Rate control:
* Delay rate control selection until the frame is about to be queued to the hardware, so retried frames can have their rate control choices changed. Frames with a static rate control selection have that applied before each TX, just to simplify the TX path (ie, not have "static" and "dynamic" rate control special cased.) * Teach ath_rate_sample about aggregates - both completion and errors. * Add an EWMA for tracking what the current "good" MCS rate is based on failure rates.
Misc:
* Introduce a bunch of dirty hacks and workarounds so TID mapping and net80211 frame inspection can be kept out of the net80211 layer. Because of the way this code works (and it's from Atheros and Linux ath9k), there is a consistent, 1:1 mapping between TID and AC. So we need to ensure that frames going to a specific TID will _always_ end up on the right AC, and vice versa, or the completion/locking will simply get very confused. I plan on addressing this mess in the future.
Known issues:
* There is no BAR frame transmission just yet. A whole lot of tidying up needs to occur before BAR frame TX can occur in the "correct" place - ie, once the TID TX queue has been drained.
* Interface reset/purge/etc results in frames in the TX and RX queues being removed. This creates holes in the sequence numbers being assigned and the TX/RX AMPDU code (on either side) just hangs.
* There's no filtered frame support at the present moment, so stations going into power saving mode will simply have a number of frames dropped - likely resulting in a traffic "hang".
* Raw frame TX is going to just not function with 11n aggregation. Likely this needs to be modified to always override the sequence number if the frame is going into an aggregation session. However, general raw frame injection currently doesn't work in general in net80211, so let's just ignore this for now until this is sorted out.
* HT protection is just not implemented and won't be until the above is sorted out. In addition, the AR5416 has issues RTS protecting large aggregates (anything >8k), so the work around needs to be ported and tested. Thus, this will be put on hold until the above work is complete.
* The rate control module 'sample' is the only currently supported module; onoe/amrr haven't been tested and have likely bit rotted a little. I'll follow up with some commits to make them work again for non-11n rates, but they won't be updated to handle 11n and aggregation. If someone wishes to do so then they're welcome to send along patches.
* .. and "sample" doesn't really do a good job of 11n TX. Specifically, the metrics used (packet TX time and failure/success rates) isn't as useful for 11n. It's likely that it should be extended to take into account the aggregate throughput possible and then choose a rate which maximises that. Ie, it may be acceptable for a higher MCS rate with a higher failure to be used if it gives a more acceptable throughput/latency then a lower MCS rate @ a lower error rate. Again, patches will be gratefully accepted.
Because of this, ATH_ENABLE_11N is still not enabled by default.
Sponsored by: Hobnob, Inc. Obtained from: Linux, Atheros
|
#
222498 |
|
30-May-2011 |
adrian |
Enable setting the short-GI bit when TX'ing HT rates but only if the hardware supports it.
Since ni->ni_htcap in hostap mode is what the remote end has advertised, not what has been negotiated/decided, we need to check ourselves what the current channel width is and what the hardware supports before enabling short-GI.
It's important that short-GI isn't enabled when it isn't negotiated and when the hardware doesn't support it (ie, short-gi for 20mhz channels on any chip < AR9287.)
I've quickly verified this on the AR9285 in 11n mode.
|
#
219985 |
|
25-Mar-2011 |
adrian |
After discussing with Bernhard, the "right" way in net80211 to check the channel width is ni->ni_chw, which is set to the negotiated channel width. ni->ni_htflags is the capability, rather than the negotiated value.
Teach both the TX path and the sample rate module about this.
|
#
219981 |
|
25-Mar-2011 |
adrian |
Re-disable the setting of 2040/shortgi bits for now.
This seems to work fine for STA but not HT/20 AP mode.
Further discussion with net80211 people will need to take place to ensure that the right flags are set based on the negotiated capabilities of the remote peer, rather than whatever the local parameters are.
Sending short-gi frames in 20mhz may work on some chips but it certainly isn't supported on anything currently supported by the HAL; and sending HT40 frames in HT20 mode just plain won't work.
|
#
219962 |
|
24-Mar-2011 |
adrian |
Flip back HT/40 and Short-GI (for 40mhz operation). These are now verified to work.
|
#
219870 |
|
22-Mar-2011 |
adrian |
Clean up setting the short preamble bit in the rate - this way it is very obvious (and cleanly so) that it occurs for non-11n rates.
|
#
219588 |
|
13-Mar-2011 |
adrian |
The number of streams is not based on the interface stream count, but the number of streams needed for that MCS rate.
|
#
219214 |
|
03-Mar-2011 |
adrian |
Disable trying to do HT/40 and short-GI TX.
These flags are just plain wrong - they're the node flags from negotiation, not the configured flags. I'll jump in later on and figure out exactly what should be done to properly set these two flags when in both STA mode (ie, what the AP says is possible and what's configured) and AP mode (ie, where the AP has a configuration, but then negotiates what's possible with each node, so per-node configuration can and will differ.)
This allows the 11n 2.4ghz/ht20 mode to associate (but perform poorly still) and exchange MCS rates with atheros reference APs and a Cisco/Linksys E3000 AP.
|
#
218935 |
|
22-Feb-2011 |
adrian |
Don't set the RTS/CTS enable bit per-scenario if the global RTS/CTS flags aren't set.
|
#
218931 |
|
21-Feb-2011 |
adrian |
* Don't setup the scenario if the try count is 0 * Comment what else is going on during rate scenario setup
|
#
218907 |
|
21-Feb-2011 |
adrian |
Implement setting the short preamble bit if it's needed for the current node.
Short preamble rates are only for legacy rates; MCS rate codes don't have a short preamble code like this.
|
#
218779 |
|
17-Feb-2011 |
adrian |
Just be double-sure short-gi isn't being enabled in 20mhz mode.
|
#
218642 |
|
13-Feb-2011 |
adrian |
This should be TX stream, not RX stream.
|
#
218593 |
|
12-Feb-2011 |
adrian |
The current code used the fields in ath_set11nratescenario() . Use them correctly:
* pass in whether to allow the hardware to override the duration field in the main data frame (durupdate_en) - PS_POLL frames in particular don't have the duration bit overriden; * there's no rts/cts duration here; that's done elsehwere
|
#
218566 |
|
11-Feb-2011 |
adrian |
.. how'd this compile before I commit it and then not now?
Fixed.
|
#
218556 |
|
11-Feb-2011 |
adrian |
The last parameter to ath_computedur_ht() is short-GI, not short-preamble.
|
#
218159 |
|
01-Feb-2011 |
adrian |
Include some preliminary TX HT rate scenario setup code.
The AR5416 and later TX descriptors have new fields for supporting 11n bits (eg 20/40mhz mode, short/long GI) and enabling/disabling RTS/CTS protection per rate.
These functions will be responsible for initialising the TX descriptors for the AR5416 and later chips for both HT and legacy frames.
Beacon frames will remain using the non-11n TX descriptor setup for now; Linux ath9k does much the same.
Note that these functions aren't yet used anywhere; a few more framework changes are needed before all of the right rate information is available for TX.
|