History log of /linux-master/drivers/net/ethernet/rocker/rocker_main.c
Revision Date Author Comments
# 1eb2cded 06-May-2024 Eric Dumazet <edumazet@google.com>

net: annotate writes on dev->mtu from ndo_change_mtu()

Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c94510 ("inet: protect against too small
mtu values.")

We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.

It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 23fe265f 22-Feb-2024 John Garry <john.g.garry@oracle.com>

rocker: Don't bother filling in ethtool driver version

The version is same as the default, so don't bother filling it in.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240222090042.12609-2-john.g.garry@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 197173db 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_bytes() when possible

The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. This was done as a basic find and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a251c17a 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_u32() when possible

The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b48b89f9 27-Sep-2022 Jakub Kicinski <kuba@kernel.org>

net: drop the weight argument from netif_napi_add

We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().

Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f029c781 30-Aug-2022 Wolfram Sang <wsa+renesas@sang-engineering.com>

net: ethernet: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 16d083e2 04-May-2022 Jakub Kicinski <kuba@kernel.org>

net: switch to netif_napi_add_tx()

Switch net callers to the new API not requiring
the NAPI_POLL_WEIGHT argument.

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7ac2d77c 09-Jan-2022 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

rocker: Remove useless DMA-32 fallback configuration

As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.

Simplify code and remove some dead code accordingly.

[1]: https://lkml.org/lkml/2021/6/7/398

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/9ba2d13099d216f3df83e50ad33a05504c90fe7c.1641744274.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2106efda 22-Nov-2021 Jakub Kicinski <kuba@kernel.org>

net: remove .ndo_change_proto_down

.ndo_change_proto_down was added seemingly to enable out-of-tree
implementations. Over 2.5yrs later we still have no real users
upstream. Hardwire the generic implementation for now, we can
revert once real users materialize. (rocker is a test vehicle,
not a user.)

We need to drop the optimization on the sysfs side, because
unlike ndos priv_flags will be changed at runtime, so we'd
need READ_ONCE/WRITE_ONCE everywhere..

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 298b0e0c 18-Oct-2021 Jakub Kicinski <kuba@kernel.org>

ethernet: rocker: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a05e4c0a 04-Oct-2021 Jakub Kicinski <kuba@kernel.org>

ethernet: use eth_hw_addr_set() for dev->addr_len cases

Convert all Ethernet drivers from memcpy(... dev->addr_len)
to eth_hw_addr_set():

@@
expression dev, np;
@@
- memcpy(dev->dev_addr, np, dev->addr_len)
+ eth_hw_addr_set(dev, np)

In theory addr_len may not be ETH_ALEN, but we don't expect
non-Ethernet devices to live under this directory, and only
the following cases of setting addr_len exist:
- cxgb4 for mgmt device,
and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f5dc00f 21-Jul-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: switchdev: let drivers inform which bridge ports are offloaded

On reception of an skb, the bridge checks if it was marked as 'already
forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
is, it assigns the source hardware domain of that skb based on the
hardware domain of the ingress port. Then during forwarding, it enforces
that the egress port must have a different hardware domain than the
ingress one (this is done in nbp_switchdev_allowed_egress).

Non-switchdev drivers don't report any physical switch id (neither
through devlink nor .ndo_get_port_parent_id), therefore the bridge
assigns them a hardware domain of 0, and packets coming from them will
always have skb->offload_fwd_mark = 0. So there aren't any restrictions.

Problems appear due to the fact that DSA would like to perform software
fallback for bonding and team interfaces that the physical switch cannot
offload.

+-- br0 ---+
/ / | \
/ / | \
/ | | bond0
/ | | / \
swp0 swp1 swp2 swp3 swp4

There, it is desirable that the presence of swp3 and swp4 under a
non-offloaded LAG does not preclude us from doing hardware bridging
beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
enough that software bridging between {swp0,swp1,swp2} and bond0 is not
impractical.

But this creates an impossible paradox given the current way in which
port hardware domains are assigned. When the driver receives a packet
from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
something.

- If we set it to 0, then the bridge will forward it towards swp1, swp2
and bond0. But the switch has already forwarded it towards swp1 and
swp2 (not to bond0, remember, that isn't offloaded, so as far as the
switch is concerned, ports swp3 and swp4 are not looking up the FDB,
and the entire bond0 is a destination that is strictly behind the
CPU). But we don't want duplicated traffic towards swp1 and swp2, so
it's not ok to set skb->offload_fwd_mark = 0.

- If we set it to 1, then the bridge will not forward the skb towards
the ports with the same switchdev mark, i.e. not to swp1, swp2 and
bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
have forwarded the skb there.

So the real issue is that bond0 will be assigned the same hardware
domain as {swp0,swp1,swp2}, because the function that assigns hardware
domains to bridge ports, nbp_switchdev_add(), recurses through bond0's
lower interfaces until it finds something that implements devlink (calls
dev_get_port_parent_id with bool recurse = true). This is a problem
because the fact that bond0 can be offloaded by swp3 and swp4 in our
example is merely an assumption.

A solution is to give the bridge explicit hints as to what hardware
domain it should use for each port.

Currently, the bridging offload is very 'silent': a driver registers a
netdevice notifier, which is put on the netns's notifier chain, and
which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
bridge, and the lower is an interface it knows about (one registered by
this driver, normally). Then, from within that notifier, it does a bunch
of stuff behind the bridge's back, without the bridge necessarily
knowing that there's somebody offloading that port. It looks like this:

ip link set swp0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v
call_netdevice_notifiers
|
v
dsa_slave_netdevice_event
|
v
oh, hey! it's for me!
|
v
.port_bridge_join

What we do to solve the conundrum is to be less silent, and change the
switchdev drivers to present themselves to the bridge. Something like this:

ip link set swp0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge: Aye! I'll use this
call_netdevice_notifiers ^ ppid as the
| | hardware domain for
v | this port, and zero
dsa_slave_netdevice_event | if I got nothing.
| |
v |
oh, hey! it's for me! |
| |
v |
.port_bridge_join |
| |
+------------------------+
switchdev_bridge_port_offload(swp0, swp0)

Then stacked interfaces (like bond0 on top of swp3/swp4) would be
treated differently in DSA, depending on whether we can or cannot
offload them.

The offload case:

ip link set bond0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge: Aye! I'll use this
call_netdevice_notifiers ^ ppid as the
| | switchdev mark for
v | bond0.
dsa_slave_netdevice_event | Coincidentally (or not),
| | bond0 and swp0, swp1, swp2
v | all have the same switchdev
hmm, it's not quite for me, | mark now, since the ASIC
but my driver has already | is able to forward towards
called .port_lag_join | all these ports in hw.
for it, because I have |
a port with dp->lag_dev == bond0. |
| |
v |
.port_bridge_join |
for swp3 and swp4 |
| |
+------------------------+
switchdev_bridge_port_offload(bond0, swp3)
switchdev_bridge_port_offload(bond0, swp4)

And the non-offload case:

ip link set bond0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge waiting:
call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload
| | wasn't called, okay, I'll use a
v | hwdom of zero for this one.
dsa_slave_netdevice_event : Then packets received on swp0 will
| : not be software-forwarded towards
v : swp1, but they will towards bond0.
it's not for me, but
bond0 is an upper of swp3
and swp4, but their dp->lag_dev
is NULL because they couldn't
offload it.

Basically we can draw the conclusion that the lowers of a bridge port
can come and go, so depending on the configuration of lowers for a
bridge port, it can dynamically toggle between offloaded and unoffloaded.
Therefore, we need an equivalent switchdev_bridge_port_unoffload too.

This patch changes the way any switchdev driver interacts with the
bridge. From now on, everybody needs to call switchdev_bridge_port_offload
and switchdev_bridge_port_unoffload, otherwise the bridge will treat the
port as non-offloaded and allow software flooding to other ports from
the same ASIC.

Note that these functions lay the ground for a more complex handshake
between switchdev drivers and the bridge in the future.

For drivers that will request a replay of the switchdev objects when
they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we
place the call to switchdev_bridge_port_unoffload() strategically inside
the NETDEV_PRECHANGEUPPER notifier's code path, and not inside
NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers
need the netdev adjacency lists to be valid, and that is only true in
NETDEV_PRECHANGEUPPER.

Cc: Vadym Kochan <vkochan@marvell.com>
Cc: Taras Chornyi <tchornyi@marvell.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch
Signed-off-by: David S. Miller <davem@davemloft.net>


# c35b57ce 10-Aug-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge

The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.

To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.

Fixes: 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2c4eca3e 14-Apr-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: switchdev: include local flag in FDB notifications

As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/

the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.

Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.

Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e18f4c18 12-Feb-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes

This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.

This patch also changes drivers to make use of the "mask" field for edge
detection when possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bae33f2b 08-Jan-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: remove the transaction structure from port attributes

Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.

In part, this patch contains a revert of my previous commit 2e554a7a5d8a
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").

For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
done mechanically, by pasting the implementation twice, then only
keeping the code that would get executed during prepare phase on top,
then only keeping the code that gets executed during the commit phase
on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
multicast router could be converted right away. But the ageing time
could not, so a shim was introduced and this was left for a further
commit.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ffb68fc5 08-Jan-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: remove the transaction structure from port object notifiers

Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.

Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.

Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.

Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# eff74233 25-Sep-2020 Taehee Yoo <ap420073@gmail.com>

net: core: introduce struct netdev_nested_priv for nested interface infrastructure

Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c68d0ceb 12-Sep-2020 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

rocker: switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'rocker_dma_ring_create()' GFP_KERNEL can be
used because it is already used in the same function just a few lines
above.

@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL

@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE

@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE

@@
@@
- PCI_DMA_NONE
+ DMA_NONE

@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 58d0c864 12-Jun-2020 Aditya Pakki <pakki001@umn.edu>

rocker: fix incorrect error handling in dma_rings_init

In rocker_dma_rings_init, the goto blocks in case of errors
caused by the functions rocker_dma_cmd_ring_waits_alloc() and
rocker_dma_ring_create() are incorrect. The patch fixes the
order consistent with cleanup in rocker_dma_rings_fini().

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 446f7391 14-Dec-2019 Ido Schimmel <idosch@mellanox.com>

ipv4: Remove old route notifications and convert listeners

Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.

This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7a59557 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: fib_notifier: propagate extack down to the notifier block callback

Since errors are propagated all the way up to the caller, propagate
possible extack of the caller all the way down to the notifier block
callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c550daf 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: fib_notifier: make FIB notifier per-netns

Currently all users of FIB notifier only cares about events in init_net.
Later in this patchset, users get interested in other namespaces too.
However, for every registered block user is interested only about one
namespace. Make the FIB notifier registration per-netns and avoid
unnecessary calls of notifier block for other namespaces.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8330f73f 04-Sep-2019 Jiri Pirko <jiri@mellanox.com>

rocker: add missing init_net check in FIB notifier

Take only FIB events that are happening in init_net into account. No other
namespaces are supported.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 011f1754 27-Jul-2019 Colin Ian King <colin.king@canonical.com>

rocker: fix memory leaks of fib_work on two error return paths

Currently there are two error return paths that leak memory allocated
to fib_work. Fix this by kfree'ing fib_work before returning.

Addresses-Coverity: ("Resource leak")
Fixes: 19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
Fixes: dbcc4fa718ee ("rocker: Fail attempts to use routes with nexthop objects")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dbcc4fa7 03-Jun-2019 David Ahern <dsahern@gmail.com>

rocker: Fail attempts to use routes with nexthop objects

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 19a9d136 05-Apr-2019 David Ahern <dsahern@gmail.com>

ipv4: Flag fib_info with a fib_nh using IPv6 gateway

Until support is added to the offload drivers, they need to be able to
reject routes with an IPv6 gateway. To that end add a flag to fib_info
that indicates if any fib_nh has a v6 gateway. The flag allows the drivers
to efficiently know the use of a v6 gateway without walking all fib_nh
tied to a fib_info each time a route is added.

Update mlxsw and rocker to reject the routes with extack message as to why.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c149314 11-Mar-2019 Kangjie Lu <kjlu@umn.edu>

net: rocker: fix a potential NULL pointer dereference

In case kzalloc fails, the fix releases resources and returns
NOTIFY_BAD to avoid NULL pointer dereference.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3d705f07 27-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

net: Remove switchdev_ops

Now that we have converted all possible callers to using a switchdev
notifier for attributes we do not have a need for implementing
switchdev_ops anymore, and this can be removed from all drivers the
net_device structure.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f705486 27-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

rocker: Handle SWITCHDEV_PORT_ATTR_SET

Following patches will change the way we communicate setting a port's
attribute and use notifiers towards that goal.

Prepare rocker to support receiving notifier events targeting
SWITCHDEV_PORT_ATTR_SET from both atomic and process context and use a
small helper to translate the event notifier into something that
rocker_port_attr_set() can process.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a25c6c0 21-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

rocker: Add missing break for PRE_BRIDGE_FLAGS

A missing break keyword should have been added after adding support for
PRE_BRIDGE_FLAGS.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 93700458ff63 ("rocker: Check Handle PORT_PRE_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 010c8f01 20-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

net: Get rid of switchdev_port_attr_get()

With the bridge no longer calling switchdev_port_attr_get() to obtain
the supported bridge port flags from a driver but instead trying to set
the bridge port flags directly and relying on driver to reject
unsupported configurations, we can effectively get rid of
switchdev_port_attr_get() entirely since this was the only place where
it was called.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc0c207a 20-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT

Now that we have converted the bridge code and the drivers to check for
bridge port(s) flags at the time we try to set them, there is no need
for a get() -> set() sequence anymore and
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused.

Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93700458 20-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

rocker: Check Handle PORT_PRE_BRIDGE_FLAGS

In preparation for getting rid of switchdev_port_attr_get(), have rocker
check for the bridge flags being set through switchdev_port_attr_set()
with the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 610d2b60 11-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

rocker: Remove getting PORT_BRIDGE_FLAGS

There is no code that attempts to get the
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS attribute, remove support for that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7026b8a6 06-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

rocker: Implement ndo_get_port_parent_id()

mlxsw implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6685987c 16-Jan-2019 Petr Machata <petrm@mellanox.com>

switchdev: Add extack argument to call_switchdev_notifiers()

A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab4a1686 22-Nov-2018 Petr Machata <petrm@mellanox.com>

rocker, dsa, ethsw: Don't filter VLAN events on bridge itself

Due to an explicit check in rocker_world_port_obj_vlan_add(),
dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects
that are added to a device that is not a front-panel port device are
ignored. Therefore this check is immaterial.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d17d9f5e 22-Nov-2018 Petr Machata <petrm@mellanox.com>

switchdev: Replace port obj add/del SDO with a notification

Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
this field from all clients, which were migrated to use switchdev
notification in the previous patches.

Add a new function switchdev_port_obj_notify() that sends the switchdev
notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.

Update switchdev_port_obj_del_now() to dispatch to this new function.
Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
likewise.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6fa35b2 22-Nov-2018 Petr Machata <petrm@mellanox.com>

rocker: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL

Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.

rocker currently doesn't support any uppers other than bridge. Thus the
only case that a stacked device could be validly referenced by port
object notifications are bridge notifications for VLAN objects added to
the bridge itself. But the driver explicitly rejects such notifications
in rocker_world_port_obj_vlan_add(). It is therefore safe to assume that
the only interesting case is that the notification is on a front-panel
port netdevice.

Subscribe to the blocking notifier chain. In the handler, filter out
notifications on any foreign netdevices. Dispatch the new notifiers to
rocker_port_obj_add() resp. _del() to maintain the behavior that the
switchdev operation based code currently has.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9333f207 18-Oct-2018 YueHaibing <yuehaibing@huawei.com>

rocker: Drop pointless static qualifier

There is no need to have the 'struct rocker_desc_info *desc_info'
variable static since new value always be assigned before use it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9ba0fbc 17-Oct-2018 Ido Schimmel <idosch@mellanox.com>

bridge: switchdev: Allow clearing FDB entry offload indication

Currently, an FDB entry only ceases being offloaded when it is deleted.
This changes with VxLAN encapsulation.

Devices capable of performing VxLAN encapsulation usually have only one
FDB table, unlike the software data path which has two - one in the
bridge driver and another in the VxLAN driver.

Therefore, bridge FDB entries pointing to a VxLAN device are only
offloaded if there is a corresponding entry in the VxLAN FDB.

Allow clearing the offload indication in case the corresponding entry
was deleted from the VxLAN FDB.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2855118f 29-May-2018 Petr Machata <petrm@mellanox.com>

rocker: rocker_main: Ignore bridge VLAN events

A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.

Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec9efb52 10-May-2018 Petr Machata <petrm@mellanox.com>

rocker: Postpone filtering of !added_by_user FDB

Breaking out of the switch in rocker_switchdev_event() still ends up
scheduling work, except an ill-defined one. This leads to an OOPS cited
below. Fix by postponing the check until rocker_switchdev_event_work().

[ 23.148476] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 23.148810] PGD 0 P4D 0
[ 23.148982] Oops: 0000 [#1] PREEMPT SMP PTI
[ 23.149190] Modules linked in: bridge stp llc iptable_nat nf_nat_ipv4 nf_nat e1000 rocker
[ 23.149768] CPU: 0 PID: 239 Comm: kworker/u2:4 Not tainted 4.17.0-rc3-net_next_queue-custom #6
[ 23.150298] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 23.150868] Workqueue: rocker rocker_switchdev_event_work [rocker]
[ 23.151258] RIP: 0010:ofdpa_port_fdb+0x7b/0x230 [rocker]
[ 23.151597] RSP: 0018:ffffc900004b3e18 EFLAGS: 00010246
[ 23.151952] RAX: 00000000fffbc68c RBX: 0000000000000000 RCX: 0000000000000000
[ 23.152363] RDX: 0000000000000010 RSI: ffff88003b4471e0 RDI: 00000000ffffffff
[ 23.152768] RBP: ffff88003b4471c0 R08: ffff88003b4471e0 R09: ffff88003b4471c0
[ 23.153141] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880036caf000
[ 23.153515] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88003bc00000
[ 23.153919] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 23.154444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.154806] CR2: 0000000000000000 CR3: 0000000036eb6000 CR4: 00000000000006f0
[ 23.155194] Call Trace:
[ 23.155472] rocker_switchdev_event_work+0x9b/0xd0 [rocker]
[ 23.155850] ? __schedule+0x231/0x700
[ 23.156175] process_one_work+0x1cf/0x3e0
[ 23.156490] worker_thread+0x26/0x3d0
[ 23.156795] ? trace_event_raw_event_workqueue_execute_start+0x80/0x80
[ 23.157181] kthread+0x10e/0x130
[ 23.157485] ? kthread_create_worker_on_cpu+0x40/0x40
[ 23.157824] ret_from_fork+0x35/0x40
[ 23.158174] Code: 00 00 c1 e8 02 4c 8d 45 20 bf ff ff ff ff 83 e0 01 4c 89 65 20 88 45 14 48 8b 05 21 da 1f e2 4c 89 c6 4c 89 44 24 10 48 89 45 18 <41> 8b 45 00 89 45 28 41 0f b7 45 04 66 89 45 2c 0f b7 44 24 04
[ 23.159346] RIP: ofdpa_port_fdb+0x7b/0x230 [rocker] RSP: ffffc900004b3e18
[ 23.159742] CR2: 0000000000000000
[ 23.160088] ---[ end trace f9b16d4cb6df0629 ]---

Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 816a3bed 03-May-2018 Petr Machata <petrm@mellanox.com>

switchdev: Add fdb.added_by_user to switchdev notifications

The following patch enables sending notifications also for events on FDB
entries that weren't added by the user. Give the drivers the information
necessary to distinguish between the two origins of FDB entries.

To maintain the current behavior, have switchdev-implementing drivers
bail out on notifications about non-user-added FDB entries. In case of
mlxsw driver, allow a call to mlxsw_sp_span_respin() so that SPAN over
bridge catches up with the changed FDB.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a83165f0 31-Jan-2018 Jiri Pirko <jiri@mellanox.com>

rocker: fix possible null pointer dereference in rocker_router_fib_event_work

Currently, rocker user may experience following null pointer
derefence bug:

[ 3.062141] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
[ 3.065163] IP: rocker_router_fib_event_work+0x36/0x110 [rocker]

The problem is uninitialized rocker->wops pointer that is initialized
only with the first initialized port. So move the port initialization
before registering the fib events.

Fixes: 936bd486564a ("rocker: use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d371ac1e 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

rocker: Ignore address families other than IPv4

As in previous patch, ignore IPv6 notifications since the driver doesn't
support these.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04b1d4e5 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

net: core: Make the FIB notification chain generic

The FIB notification chain is currently soley used by IPv4 code.
However, we're going to introduce IPv6 FIB offload support, which
requires these notification as well.

As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when
registering FIB notifier"), upon registration to the chain, the callee
receives a full dump of the FIB tables and rules by traversing all the
net namespaces. The integrity of the dump is ensured by a per-namespace
sequence counter that is incremented whenever a change to the tables or
rules occurs.

In order to allow more address families to use the chain, each family is
expected to register its fib_notifier_ops in its pernet init. These
operations allow the common code to read the family's sequence counter
as well as dump its tables and rules in the given net namespace.

Additionally, a 'family' parameter is added to sent notifications, so
that listeners could distinguish between the different families.

Implement the common code that allows listeners to register to the chain
and for address families to register their fib_notifier_ops. Subsequent
patches will implement these operations in IPv6.

In the future, ipmr and ip6mr will be extended to provide these
notifications as well.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# abfbf8a0 08-Jun-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

rocker: Remove support bridge bypass FDB

The FDB add/delete are now done through the notification chain. The FDBs
are synced with the bridge and there is no need for extra dumping.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 403caa7a 08-Jun-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

rocker: Remove support for bypass bridge port attributes/vlan set

The bridge port attributes/vlan for mlxsw devices should be set only
from bridge code. The vlans are synced totally with the bridge so
there is no need to special dump support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 726fd42f 08-Jun-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

rocker: Add support for learning FDB through notification

Add support for learning FDB through notification. The driver defers
the hardware update via ordered work queue.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 00fc0c51 08-Jun-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

rocker: Change world_ops API and implementation to be switchdev independant

Currently the switchdev_trans struct is embedded in the world_ops API.
In order to add support for adding FDB via a notfication chain the API should
be switchdev independent.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 96673a30 08-Jun-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

rocker: Add support for querying supported bridge flags

Add support for querying supported bridge flags.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5d7bfd14 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

ipv4: fib_rules: Dump FIB rules when registering FIB notifier

In commit c3852ef7f2f8 ("ipv4: fib: Replay events when registering FIB
notifier") we dumped the FIB tables and replayed the events to the
passed notification block.

However, we merely sent a RULE_ADD notification in case custom rules
were in use. As explained in previous patches, this approach won't work
anymore. Instead, we should notify the caller about all the FIB rules
and let it act accordingly.

Upon registration to the FIB notification chain, replay a RULE_ADD
notification for each programmed FIB rule, custom or not. The integrity
of the dump is ensured by the mechanism introduced in the above
mentioned commit.

Prevent regressions by making sure current listeners correctly sanitize
the notified rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de480150 26-Feb-2017 Philippe Reynes <tremyfr@gmail.com>

net: rocker: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ad20165 30-Jan-2017 Eric Dumazet <edumazet@google.com>

drivers: net: generalize napi_complete_done()

napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396ba
("net: gro: add a per device gro flush timer")

This allows for more efficient GRO aggregation without
sacrifying latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3852ef7 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

ipv4: fib: Replay events when registering FIB notifier

Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
introduced a new notification chain to notify listeners (f.e., switchdev
drivers) about addition and deletion of routes.

However, upon registration to the chain the FIB tables can already be
populated, which means potential listeners will have an incomplete view
of the tables.

Solve that by dumping the FIB tables and replaying the events to the
passed notification block. The dump itself is done using RCU in order
not to starve consumers that need RTNL to make progress.

The integrity of the dump is ensured by reading the FIB change sequence
counter before and after the dump under RTNL. This allows us to avoid
the problematic situation in which the dumping process sends a ENTRY_ADD
notification following ENTRY_DEL generated by another process holding
RTNL.

Callers of the registration function may pass a callback that is
executed in case the dump was inconsistent with current FIB tables.

The number of retries until a consistent dump is achieved is set to a
fixed number to prevent callers from looping for long periods of time.
In case current limit proves to be problematic in the future, it can be
easily converted to be configurable using a sysctl.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 17f8be7d 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

rocker: Register FIB notifier before creating ports

We can miss FIB notifications sent between the time the ports were
created and the FIB notification block registered.

Instead of receiving these notifications only when they are replayed for
the FIB notification block during registration, just register the
notification block before the ports are created.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db701955 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

rocker: Implement FIB offload in deferred work

Convert rocker to offload FIBs in deferred work in a similar fashion to
mlxsw, which was converted in the previous commits.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1bb279c 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

rocker: Create an ordered workqueue for FIB offload

As explained in the previous commits, we need to process FIB entries
addition / deletion events in FIFO order or otherwise we can have a
mismatch between the kernel's FIB table and the device's.

Create an ordered workqueue for rocker to which these work items will be
submitted to.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97699056 27-Oct-2016 Jiri Pirko <jiri@mellanox.com>

rocker: set physical device for port netdevice

Do this so the sysfs has "device" link correctly set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92d230dd 22-Oct-2016 Wei Yongjun <weiyongjun1@huawei.com>

rocker: fix error return code in rocker_world_check_init()

Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: e420114eef4a ("rocker: introduce worlds infrastructure")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cf2d6740 17-Oct-2016 David Ahern <dsa@cumulusnetworks.com>

rocker: Flip to the new dev walk API

Convert rocker to the new dev walk API. This is just a code conversion;
no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 44770e11 17-Oct-2016 Jarod Wilson <jarod@redhat.com>

ethernet: use core min/max MTU checking

et131x: min_mtu 64, max_mtu 9216

altera_tse: min_mtu 64, max_mtu 1500

amd8111e: min_mtu 60, max_mtu 9000

bnad: min_mtu 46, max_mtu 9000

macb: min_mtu 68, max_mtu 1500 or 10240 depending on hardware capability

xgmac: min_mtu 46, max_mtu 9000

cxgb2: min_mtu 68, max_mtu 9582 (pm3393) or 9600 (vsc7326)

enic: min_mtu 68, max_mtu 9000

gianfar: min_mtu 50, max_mu 9586

hns_enet: min_mtu 68, max_mtu 9578 (v1) or 9706 (v2)

ksz884x: min_mtu 60, max_mtu 1894

myri10ge: min_mtu 68, max_mtu 9000

natsemi: min_mtu 64, max_mtu 2024

nfp: min_mtu 68, max_mtu hardware-specific

forcedeth: min_mtu 64, max_mtu 1500 or 9100, depending on hardware

pch_gbe: min_mtu 46, max_mtu 10300

pasemi_mac: min_mtu 64, max_mtu 9000

qcaspi: min_mtu 46, max_mtu 1500
- remove qcaspi_netdev_change_mtu as it is now redundant

rocker: min_mtu 68, max_mtu 9000

sxgbe: min_mtu 68, max_mtu 9000

stmmac: min_mtu 46, max_mtu depends on hardware

tehuti: min_mtu 60, max_mtu 16384
- driver had no max mtu checking, but product docs say 16k jumbo packets
are supported by the hardware

netcp: min_mtu 68, max_mtu 9486
- remove netcp_ndo_change_mtu as it is now redundant

via-velocity: min_mtu 64, max_mtu 9000

octeon: min_mtu 46, max_mtu 65370

CC: netdev@vger.kernel.org
CC: Mark Einon <mark.einon@gmail.com>
CC: Vince Bridgers <vbridger@opensource.altera.com>
CC: Rasesh Mody <rasesh.mody@qlogic.com>
CC: Nicolas Ferre <nicolas.ferre@atmel.com>
CC: Santosh Raspatur <santosh@chelsio.com>
CC: Hariprasad S <hariprasad@chelsio.com>
CC: Christian Benvenuti <benve@cisco.com>
CC: Sujith Sankar <ssujith@cisco.com>
CC: Govindarajulu Varadarajan <_govind@gmx.com>
CC: Neel Patel <neepatel@cisco.com>
CC: Claudiu Manoil <claudiu.manoil@freescale.com>
CC: Yisen Zhuang <yisen.zhuang@huawei.com>
CC: Salil Mehta <salil.mehta@huawei.com>
CC: Hyong-Youb Kim <hykim@myri.com>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Olof Johansson <olof@lixom.net>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Byungho An <bh74.an@samsung.com>
CC: Girish K S <ks.giri@samsung.com>
CC: Vipul Pandya <vipul.pandya@samsung.com>
CC: Giuseppe Cavallaro <peppe.cavallaro@st.com>
CC: Alexandre Torgue <alexandre.torgue@st.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Wingman Kwok <w-kwok2@ti.com>
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 936bd486 25-Sep-2016 Jiri Pirko <jiri@mellanox.com>

rocker: use FIB notifications instead of switchdev calls

Until now, in order to offload a FIB entry to HW we use switchdev op.
Use the newly introduced FIB notifier infrasturucture to process
FIB entries to offload the in HW.
Abort mechanism is now handled within the rocker driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6bc506b4 25-Aug-2016 Ido Schimmel <idosch@mellanox.com>

bridge: switchdev: Add forward mark support for stacked devices

switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 503eebc2 05-Jul-2016 Jiri Pirko <jiri@mellanox.com>

net: add dev arg to ndo_neigh_construct/destroy

As the following patch will allow upper devices to follow the call down
lower devices, we need to add dev here and not rely on n->dev.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3a8befcd 11-Mar-2016 Jiri Pirko <jiri@mellanox.com>

rocker: move ageing_time from struct rocker to struct ofdpa

This is OF-DPA specific, used only there, similar to
ofdpa_port->ageing_time. So move it to OF-DPA code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 88de1cd4 08-Mar-2016 Ido Schimmel <idosch@mellanox.com>

rocker: set FDB cleanup timer according to lowest ageing time

In rocker, ageing time is a per-port attribute, so the next time the FDB
cleanup timer fires should be set according to the lowest ageing time.

This will later allow us to delete the BR_MIN_AGEING_TIME macro, which was
added to guarantee minimum ageing time in the bridge layer, thereby breaking
existing behavior.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a30a9ea6 22-Feb-2016 Dan Carpenter <dan.carpenter@oracle.com>

rocker: fix rocker_world_port_obj_vlan_add()

We were changing return values and accidentally made
rocker_world_port_obj_vlan_add() into a no-op.

Fixes: fccd84d44912 ('rocker: return -EOPNOTSUPP for undefined world ops')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fccd84d4 16-Feb-2016 Jiri Pirko <jiri@resnulli.us>

rocker: return -EOPNOTSUPP for undefined world ops

Suggested-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3fbcdbf3 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: move OF-DPA stuff into separate file

Carve out OF-DPA would specific code from the common file to the world
file. This change required struct rocker and struct rocker_port split
into world specific struct ofdpa and struct ofdpa_port. Along with this
the world specific functions and defines were renamed from prefix
"rocker_" to "ofdpa_".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 53901cc0 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: call rocker_cmd_exec function with "nowait" boolean instead of flags

No need to push down rocker flags just to check if this is nowait or
not. Let the caller handle that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae3907ec 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: remove trans parameter to rocker_cmd_exec function

The only purpose of passing this parameter is to check for
prepare phase. The only reason for a failure in that state is if
TLVs don't fit into descriptor. That is highly unlikely and if that
happens, it is a driver bug. So remove this parameter from
rocker_cmd_exec, and check for prepare phase in caller.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ca0a5f2a 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: pre-allocate wait structures during cmd ring init

This avoids need to alloc/free wait structure for every command call.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1fe922e 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: pass "learning" value as a parameter to rocker_port_set_learning

Be consistent with the rest of the setting functions, and pass
"learning" as a bool function parameter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e420114e 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: introduce worlds infrastructure

This is another step on the way to per-world clean cut. Introduce world
ops hooks which each world can implement in world-specific way.
Also introduce world infrastructure along with OF-DPA world stub.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0514c4e8 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: move rocker and rocker_port structs into header

And take some other related thing along. They are going to be pushed
into of-dpa part anyway.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e1ba3dee 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: implement get settings mode command

Introduce a helper to ask HW for the port mode (world).

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de152192 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: push tlv processing into separate files

Carve out TLV processing helpers into separate files.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 11ce2ba3 16-Feb-2016 Jiri Pirko <jiri@mellanox.com>

rocker: rename rocker.c to rocker_main.c

Since "rocker.c" is going to be split into multiple files, start with
renaming original "rocker.c" file to "rocker_main.c". Multiple code
parts are going to be cut from "rocker_main.c" later on.

Fix couple of checkpatch issues on the way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>