History log of /linux-master/net/sched/act_tunnel_key.c
Revision Date Author Comments
# 5832c4a7 27-Mar-2024 Alexander Lobakin <aleksander.lobakin@intel.com>

ip_tunnel: convert __be16 tunnel flags to bitmaps

Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
have been defined as __be16. Now all of those 16 bits are occupied
and there's no more free space for new flags.
It can't be simply switched to a bigger container with no
adjustments to the values, since it's an explicit Endian storage,
and on LE systems (__be16)0x0001 equals to
(__be64)0x0001000000000000.
We could probably define new 64-bit flags depending on the
Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
LE, but that would introduce an Endianness dependency and spawn a
ton of Sparse warnings. To mitigate them, all of those places which
were adjusted with this change would be touched anyway, so why not
define stuff properly if there's no choice.

Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
value already coded and a fistful of <16 <-> bitmap> converters and
helpers. The two flags which have a different bit position are
SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
__cpu_to_be16(), but as (__force __be16), i.e. had different
positions on LE and BE. Now they both have strongly defined places.
Change all __be16 fields which were used to store those flags, to
IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
unsigned long[1] for now, and replace all TUNNEL_* occurrences to
their bitmap counterparts. Use the converters in the places which talk
to the userspace, hardware (NFP) or other hosts (GRE header). The rest
must explicitly use the new flags only. This must be done at once,
otherwise there will be too many conversions throughout the code in
the intermediate commits.
Finally, disable the old __be16 flags for use in the kernel code
(except for the two 'irregular' flags mentioned above), to prevent
any accidental (mis)use of them. For the userspace, nothing is
changed, only additions were made.

Most noticeable bloat-o-meter difference (.text):

vmlinux: 307/-1 (306)
gre.ko: 62/0 (62)
ip_gre.ko: 941/-217 (724) [*]
ip_tunnel.ko: 390/-900 (-510) [**]
ip_vti.ko: 138/0 (138)
ip6_gre.ko: 534/-18 (516) [*]
ip6_tunnel.ko: 118/-10 (108)

[*] gre_flags_to_tnl_flags() grew, but still is inlined
[**] ip_tunnel_find() got uninlined, hence such decrease

The average code size increase in non-extreme case is 100-200 bytes
per module, mostly due to sizeof(long) > sizeof(__be16), as
%__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
are able to expand the majority of bitmap_*() calls here into direct
operations on scalars.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 241a94ab 01-Feb-2024 Michal Koutný <mkoutny@suse.com>

net/sched: Add module aliases for cls_,sch_,act_ modules

No functional change intended, aliases will be used in followup commits.
Note for backporters: you may need to add aliases also for modules that
are already removed in mainline kernel but still in your version.

Patches were generated with the help of Coccinelle scripts like:

cat >scripts/coccinelle/misc/tcf_alias.cocci <<EOD
virtual patch
virtual report

@ haskernel @
@@

@ tcf_has_kind depends on report && haskernel @
identifier ops;
constant K;
@@

static struct tcf_proto_ops ops = {
.kind = K,
...
};
+char module_alias = K;
EOD

/usr/bin/spatch -D report --cocci-file scripts/coccinelle/misc/tcf_alias.cocci \
--dir . \
-I ./arch/x86/include -I ./arch/x86/include/generated -I ./include \
-I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi \
-I ./include/uapi -I ./include/generated/uapi \
--include ./include/linux/compiler-version.h --include ./include/linux/kconfig.h \
--jobs 8 --chunksize 1 2>/dev/null | \
sed 's/char module_alias = "\([^"]*\)";/MODULE_ALIAS_NET_CLS("\1");/'

And analogously for:

static struct tc_action_ops ops = {
.kind = K,

static struct Qdisc_ops ops = {
.id = K,

(Someone familiar would be able to fit those into one .cocci file
without sed post processing.)

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201130943.19536-3-mkoutny@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c2a67de9 29-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: introduce ACT_P_BOUND return code

Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2384127e 29-Mar-2023 Davide Caratti <dcaratti@redhat.com>

net/sched: act_tunnel_key: add support for "don't fragment"

extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 871cf386 06-Dec-2022 Pedro Tammela <pctammela@mojatatu.com>

net/sched: avoid indirect act functions on retpoline kernels

Expose the necessary tc act functions and wire up act_api to use
direct calls in retpoline kernels.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6ffa368 07-Sep-2022 Zhengchao Shao <shaozhengchao@huawei.com>

net: sched: act_tunnel_key: get rid of tunnel_key_walker and tunnel_key_search

tunnel_key_walker() and tunnel_key_search() do the same thing as generic
walk/search function, so remove them.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# acd0a7ab 07-Sep-2022 Zhengchao Shao <shaozhengchao@huawei.com>

net: sched: act: move global static variable net_id to tc_action_ops

Each tc action module has a corresponding net_id, so put net_id directly
into the structure tc_action_ops.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee367d44 07-Apr-2022 Ido Schimmel <idosch@nvidia.com>

net/sched: act_tunnel_key: Add extack message for offload failure

For better error reporting to user space, add an extack message when
tunnel_key action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than set/release
modes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2ccf84e 07-Apr-2022 Ido Schimmel <idosch@nvidia.com>

net/sched: act_api: Add extack to offload_act_setup() callback

The callback is used by various actions to populate the flow action
structure prior to offload. Pass extack to this callback so that the
various actions will be able to report accurate error messages to user
space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8cbfe939 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: allow user to offload tc action to net device

Use flow_indr_dev_register/flow_indr_dev_setup_offload to
offload tc action.

We need to call tc_cleanup_flow_action to clean up tc action entry since
in tc_setup_action, some actions may hold dev refcnt, especially the mirror
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c54e1d92 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: add ops to tc_action_ops for flow action setup

Add a new ops to tc_action_ops for flow action setup.

Refactor function tc_setup_flow_action to use this new ops.

We make this change to facilitate to add standalone action module.

We will also use this ops to offload action independent of filter
in following patch.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 695176bf 29-Jul-2021 Cong Wang <cong.wang@bytedance.com>

net_sched: refactor TC action init API

TC action ->init() API has 10 parameters, it becomes harder
to read. Some of them are just boolean and can be replaced
by flags. Similarly for the internal API tcf_action_init()
and tcf_exts_validate().

This patch converts them to flags and fold them into
the upper 16 bits of "flags", whose lower 16 bits are still
reserved for user-space. More specifically, the following
kernel flags are introduced:

TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
distinguish whether it is compatible with policer.

TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
this action is bound to a filter.

TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts,
means we are replacing an existing action.

TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
opposite meaning, because we still hold RTNL in most
cases.

The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
untouched and still stored as before.

I have tested this patch with tdc and I do not see any
failure related to this patch.

Tested-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7a12b5a 20-Oct-2020 Davide Caratti <dcaratti@redhat.com>

net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels

the following command

# tc action add action tunnel_key \
> set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0

generates the following splat:

BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
Write of size 4 at addr ffff88813f5f1cc8 by task tc/873

CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
dump_stack+0x99/0xcb
print_address_description.constprop.7+0x1e/0x230
kasan_report.cold.13+0x37/0x7c
tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
tunnel_key_init+0x160c/0x1f40 [act_tunnel_key]
tcf_action_init_1+0x5b5/0x850
tcf_action_init+0x15d/0x370
tcf_action_add+0xd9/0x2f0
tc_ctl_action+0x29b/0x3a0
rtnetlink_rcv_msg+0x341/0x8d0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x439/0x630
netlink_sendmsg+0x719/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5ba/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f872a96b338
Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338
RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c
R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000

Allocated by task 873:
kasan_save_stack+0x19/0x40
__kasan_kmalloc.constprop.7+0xc1/0xd0
__kmalloc+0x151/0x310
metadata_dst_alloc+0x20/0x40
tunnel_key_init+0xfff/0x1f40 [act_tunnel_key]
tcf_action_init_1+0x5b5/0x850
tcf_action_init+0x15d/0x370
tcf_action_add+0xd9/0x2f0
tc_ctl_action+0x29b/0x3a0
rtnetlink_rcv_msg+0x341/0x8d0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x439/0x630
netlink_sendmsg+0x719/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5ba/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88813f5f1c00
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 200 bytes inside of
256-byte region [ffff88813f5f1c00, ffff88813f5f1d00)
The buggy address belongs to the page:
page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0
head:0000000011b48a19 order:1 compound_mapcount:0
flags: 0x17ffffc0010200(slab|head)
raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
^
ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
the tunnel metadata, but then it expects additional bytes to store tunnel
specific metadata with tunnel_key_copy_opts().

Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
size previously computed by tunnel_key_get_opts_len(), like it's done for
IPv4 tunnels.

Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e49d8c22 22-Sep-2020 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: defer tcf_idr_insert() in tcf_action_init_1()

All TC actions call tcf_idr_insert() for new action at the end
of their ->init(), so we can actually move it to a central place
in tcf_action_init_1().

And once the action is inserted into the global IDR, other parallel
process could free it immediately as its refcnt is still 1, so we can
not fail after this, we need to move it after the goto action
validation to avoid handling the failure case after insertion.

This is found during code review, is not directly triggered by syzbot.
And this prepares for the next patch.

Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 13e6ce98 13-Sep-2020 Xin Long <lucien.xin@gmail.com>

net: sched: only keep the available bits when setting vxlan md->gbp

As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
be set to or parse from the packet for vxlan gbp option.

So we'd better do the mask when set it in act_tunnel_key and cls_flower.
Otherwise, when users don't know these bits, they may configure with a
value which can never be matched.

Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e20d4ff2 21-Nov-2019 Xin Long <lucien.xin@gmail.com>

net: sched: add erspan option support to act_tunnel_key

This patch is to allow setting erspan options using the
act_tunnel_key action. Different from geneve options,
only one option can be set. And also, geneve options,
vxlan options or erspan options can't be set at the
same time.

Options are expressed as ver:index:dir:hwid, when ver
is set to 1, index will be applied while dir and hwid
will be ignored, and when ver is set to 2, dir and
hwid will be used while index will be ignored.

# ip link add name erspan1 type erspan external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
erspan_opts 1:2:0:0 \
action mirred egress redirect dev erspan1

v1->v2:
- do the validation when dst is not yet allocated as Jakub suggested.
- use Duplicate instead of Wrong in err msg for extack.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fca3f91c 21-Nov-2019 Xin Long <lucien.xin@gmail.com>

net: sched: add vxlan option support to act_tunnel_key

This patch is to allow setting vxlan options using the
act_tunnel_key action. Different from geneve options,
only one option can be set. And also, geneve options
and vxlan options can't be set at the same time.

gbp is the only param for vxlan options:

# ip link add name vxlan0 type vxlan dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
vxlan_opts 01020304 \
action mirred egress redirect dev vxlan0

v1->v2:
- add .strict_start_type for enc_opts_policy as Jakub noticed.
- use Duplicate instead of Wrong in err msg for extack as Jakub
suggested.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f0e97d0 18-Nov-2019 Xin Long <lucien.xin@gmail.com>

net: sched: ensure opts_len <= IP_TUNNEL_OPTS_MAX in act_tunnel_key

info->options_len is 'u8' type, and when opts_len with a value >
IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int
to u8 and set a wrong value to info->options_len.

Kernel crashed in my test when doing:

# opts="0102:80:00800022"
# for i in {1..99}; do opts="$opts,0102:80:00800022"; done
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 ip_proto udp action tunnel_key \
set src_ip 10.0.99.192 dst_ip 10.0.99.193 \
dst_port 6081 id 11 geneve_opts $opts \
action mirred egress redirect dev geneve0

So we should do the similar check as cls_flower does, return error
when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts().

Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3822678 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: update action implementations to support flags

Extend struct tc_action with new "tcfa_flags" field. Set the field in
tcf_idr_create() function and provide new helper
tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
value. Update individual hardware-offloaded actions init() to pass their
"flags" argument to new helper in order to skip percpu stats allocation
when user requested it through flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# abbb0d33 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: extend TCA_ACT space with TCA_ACT_FLAGS

Extend TCA_ACT space with nla_bitfield32 flags. Add
TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
tcf_action_init_1() and pass resulting value as additional argument to
a_o->init().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e1ad95b 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: extract bstats update code into function

Extract common code that increments cpu_bstats counter into standalone act
API function. Change hardware offloaded actions that use percpu counter
allocation to use the new function instead of incrementing cpu_bstats
directly.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 445d3749 23-Sep-2019 Paul E. McKenney <paulmck@kernel.org>

net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace_pointer() as a step towards removing
rcu_swap_protected().

Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>


# 981471bd 25-Aug-2019 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: fix a NULL pointer deref in ipt action

The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().

The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.

Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7be8ef2c 01-Aug-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net: sched: use temporary variable for actions indexes

Currently init call of all actions (except ipt) init their 'parm'
structure as a direct pointer to nla data in skb. This leads to race
condition when some of the filter actions were initialized successfully
(and were assigned with idr action index that was written directly
into nla data), but then were deleted and retried (due to following
action module missing or classifier-initiated retry), in which case
action init code tries to insert action to idr with index that was
assigned on previous iteration. During retry the index can be reused
by another action that was inserted concurrently, which causes
unintended action sharing between filters.
To fix described race condition, save action idr index to temporary
stack-allocated variable instead on nla data.

Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 8cb08174 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0be8de 26-Apr-2019 Michal Kubecek <mkubecek@suse.cz>

netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e5fdabac 20-Mar-2019 Davide Caratti <dcaratti@redhat.com>

net/sched: act_tunnel_key: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \
> nocsum id 1 pass index 90
# tc actions replace action tunnel_key \
> set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action tunnel_key

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: tunnel_key set
src_ip 10.10.10.1
dst_ip 20.20.2.0
key_id 1
dst_port 3128
nocsum goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002
RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246
RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000
R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40
FS: 0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0
Call Trace:
<IRQ>
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ip6_finish_output2+0x369/0x590
ip6_finish_output2+0x369/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.35+0x79/0xc0
mld_sendpack+0x16f/0x220
mld_ifc_timer_expire+0x195/0x2c0
? igmp6_timer_handler+0x70/0x70
call_timer_fn+0x2b/0x130
run_timer_softirq+0x3e8/0x440
? tick_sched_timer+0x37/0x70
__do_softirq+0xe3/0x2f5
irq_exit+0xf0/0x100
smp_apic_timer_interrupt+0x6c/0x130
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:native_safe_halt+0x2/0x10
Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00
R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
? __sched_text_end+0x1/0x1
default_idle+0x1c/0x140
do_idle+0x1c4/0x280
cpu_startup_entry+0x19/0x20
start_secondary+0x1a7/0x200
secondary_startup_64+0xa4/0xb0
Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_tunnel_key_init() proved to fix
the above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85d0966f 20-Mar-2019 Davide Caratti <dcaratti@redhat.com>

net/sched: prepare TC actions to properly validate the control action

- pass a pointer to struct tcf_proto in each actions's init() handler,
to allow validating the control action, checking whether the chain
exists and (eventually) refcounting it.
- remove code that validates the control action after a successful call
to the action's init() handler, and replace it with a test that forbids
addition of actions having 'goto_chain' and NULL goto_chain pointer at
the same time.
- add tcf_action_check_ctrlact(), that will validate the control action
and eventually allocate the action 'goto_chain' within the init()
handler.
- add tcf_action_set_ctrlact(), that will assign the control action and
swap the current 'goto_chain' pointer with the new given one.

This disallows 'goto_chain' on actions that don't initialize it properly
in their init() handler, i.e. calling tcf_action_check_ctrlact() after
successful IDR reservation and then calling tcf_action_set_ctrlact()
to assign 'goto_chain' and 'tcf_action' consistently.

By doing this, the kernel does not leak anymore refcounts when a valid
'goto chain' handle is replaced in TC actions, causing kmemleak splats
like the following one:

# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto tcp action drop
# tc chain add dev dd0 chain 43 ingress protocol ip flower \
> ip_proto udp action drop
# tc filter add dev dd0 ingress matchall \
> action gact goto chain 42 index 66
# tc filter replace dev dd0 ingress matchall \
> action gact goto chain 43 index 66
# echo scan >/sys/kernel/debug/kmemleak
<...>
unreferenced object 0xffff93c0ee09f000 (size 1024):
comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
[<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
[<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
[<000000006126a348>] netlink_unicast+0x1a0/0x250
[<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
[<00000000a25a2171>] sock_sendmsg+0x36/0x40
[<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
[<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
[<000000007a6c61f9>] do_syscall_64+0x5b/0x180
[<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0000000013eaa334>] 0xffffffffffffffff

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4177c5d9 04-Mar-2019 wenxu <wenxu@ucloud.cn>

net/sched: act_tunnel_key: Fix double free dst_cache

dst_cache_destroy will be called in dst_release

dst_release-->dst_destroy_rcu-->dst_destroy-->metadata_dst_free
-->dst_cache_destroy

It should not call dst_cache_destroy before dst_release

Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 096461de 04-Mar-2019 Arnd Bergmann <arnd@arndb.de>

net/sched: avoid unused-label warning

The label is only used from inside the #ifdef and should be
hidden the same way, to avoid this warning:

net/sched/act_tunnel_key.c: In function 'tunnel_key_init':
net/sched/act_tunnel_key.c:389:1: error: label 'release_tun_meta' defined but not used [-Werror=unused-label]
release_tun_meta:

Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87750d17 25-Feb-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: act_tunnel_key: fix metadata handling

Tunnel key action params->tcft_enc_metadata is only set when action is
TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly
dereferenced during tunnel key init and release without verifying that
action is if correct type, which causes NULL pointer dereference. Metadata
tunnel dst_cache is also leaked on action overwrite.

Fix metadata handling:
- Verify that metadata pointer is not NULL before dereferencing it in
tunnel_key_init error handling code.
- Move dst_cache destroy code into tunnel_key_release_params() function
that is called in both action overwrite and release cases (fixes resource
leak) and verifies that actions has correct type before dereferencing
metadata pointer (fixes NULL pointer dereference).

Oops with KASAN enabled during tdc tests execution:

[ 261.080482] ==================================================================
[ 261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0
[ 261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976
[ 261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157
[ 261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 261.116726] Call Trace:
[ 261.119234] dump_stack+0x9a/0xeb
[ 261.122625] ? dst_cache_destroy+0x21/0xa0
[ 261.126818] ? dst_cache_destroy+0x21/0xa0
[ 261.131004] kasan_report+0x176/0x192
[ 261.134752] ? idr_get_next+0xd0/0x120
[ 261.138578] ? dst_cache_destroy+0x21/0xa0
[ 261.142768] dst_cache_destroy+0x21/0xa0
[ 261.146799] tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[ 261.152392] tcf_action_cleanup+0x2c/0xc0
[ 261.156490] tcf_generic_walker+0x4c2/0x5c0
[ 261.160794] ? tcf_action_dump_1+0x390/0x390
[ 261.165163] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[ 261.170865] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[ 261.176641] tca_action_gd+0x600/0xa40
[ 261.180482] ? tca_get_fill.constprop.17+0x200/0x200
[ 261.185548] ? __lock_acquire+0x588/0x1d20
[ 261.189741] ? __lock_acquire+0x588/0x1d20
[ 261.193922] ? mark_held_locks+0x90/0x90
[ 261.197944] ? mark_held_locks+0x90/0x90
[ 261.202018] ? __nla_parse+0xfe/0x190
[ 261.205774] tc_ctl_action+0x218/0x230
[ 261.209614] ? tcf_action_add+0x230/0x230
[ 261.213726] rtnetlink_rcv_msg+0x3a5/0x600
[ 261.217910] ? lock_downgrade+0x2d0/0x2d0
[ 261.222006] ? validate_linkmsg+0x400/0x400
[ 261.226278] ? find_held_lock+0x6d/0xd0
[ 261.230200] ? match_held_lock+0x1b/0x210
[ 261.234296] ? validate_linkmsg+0x400/0x400
[ 261.238567] netlink_rcv_skb+0xc7/0x1f0
[ 261.242489] ? netlink_ack+0x470/0x470
[ 261.246319] ? netlink_deliver_tap+0x1f3/0x5a0
[ 261.250874] netlink_unicast+0x2ae/0x350
[ 261.254884] ? netlink_attachskb+0x340/0x340
[ 261.261647] ? _copy_from_iter_full+0xdd/0x380
[ 261.268576] ? __virt_addr_valid+0xb6/0xf0
[ 261.275227] ? __check_object_size+0x159/0x240
[ 261.282184] netlink_sendmsg+0x4d3/0x630
[ 261.288572] ? netlink_unicast+0x350/0x350
[ 261.295132] ? netlink_unicast+0x350/0x350
[ 261.301608] sock_sendmsg+0x6d/0x80
[ 261.307467] ___sys_sendmsg+0x48e/0x540
[ 261.313633] ? copy_msghdr_from_user+0x210/0x210
[ 261.320545] ? save_stack+0x89/0xb0
[ 261.326289] ? __lock_acquire+0x588/0x1d20
[ 261.332605] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 261.340063] ? mark_held_locks+0x90/0x90
[ 261.346162] ? do_filp_open+0x138/0x1d0
[ 261.352108] ? may_open_dev+0x50/0x50
[ 261.357897] ? match_held_lock+0x1b/0x210
[ 261.364016] ? __fget_light+0xa6/0xe0
[ 261.369840] ? __sys_sendmsg+0xd2/0x150
[ 261.375814] __sys_sendmsg+0xd2/0x150
[ 261.381610] ? __ia32_sys_shutdown+0x30/0x30
[ 261.388026] ? lock_downgrade+0x2d0/0x2d0
[ 261.394182] ? mark_held_locks+0x1c/0x90
[ 261.400230] ? do_syscall_64+0x1e/0x280
[ 261.406172] do_syscall_64+0x78/0x280
[ 261.411932] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 261.419103] RIP: 0033:0x7f28e91a8b87
[ 261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[ 261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[ 261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[ 261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[ 261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[ 261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[ 261.506281] ==================================================================
[ 261.516076] Disabling lock debugging due to kernel taint
[ 261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[ 261.534413] #PF error: [normal kernel read fault]
[ 261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0
[ 261.551294] Oops: 0000 [#1] SMP KASAN PTI
[ 261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G B 5.0.0-rc7+ #157
[ 261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0
[ 261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[ 261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[ 261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[ 261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[ 261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[ 261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[ 261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[ 261.671110] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[ 261.682447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0
[ 261.701283] Call Trace:
[ 261.706374] tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[ 261.714522] tcf_action_cleanup+0x2c/0xc0
[ 261.721208] tcf_generic_walker+0x4c2/0x5c0
[ 261.728074] ? tcf_action_dump_1+0x390/0x390
[ 261.734996] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[ 261.743247] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[ 261.751557] tca_action_gd+0x600/0xa40
[ 261.757991] ? tca_get_fill.constprop.17+0x200/0x200
[ 261.765644] ? __lock_acquire+0x588/0x1d20
[ 261.772461] ? __lock_acquire+0x588/0x1d20
[ 261.779266] ? mark_held_locks+0x90/0x90
[ 261.785880] ? mark_held_locks+0x90/0x90
[ 261.792470] ? __nla_parse+0xfe/0x190
[ 261.798738] tc_ctl_action+0x218/0x230
[ 261.805145] ? tcf_action_add+0x230/0x230
[ 261.811760] rtnetlink_rcv_msg+0x3a5/0x600
[ 261.818564] ? lock_downgrade+0x2d0/0x2d0
[ 261.825433] ? validate_linkmsg+0x400/0x400
[ 261.832256] ? find_held_lock+0x6d/0xd0
[ 261.838624] ? match_held_lock+0x1b/0x210
[ 261.845142] ? validate_linkmsg+0x400/0x400
[ 261.851729] netlink_rcv_skb+0xc7/0x1f0
[ 261.857976] ? netlink_ack+0x470/0x470
[ 261.864132] ? netlink_deliver_tap+0x1f3/0x5a0
[ 261.870969] netlink_unicast+0x2ae/0x350
[ 261.877294] ? netlink_attachskb+0x340/0x340
[ 261.883962] ? _copy_from_iter_full+0xdd/0x380
[ 261.890750] ? __virt_addr_valid+0xb6/0xf0
[ 261.897188] ? __check_object_size+0x159/0x240
[ 261.903928] netlink_sendmsg+0x4d3/0x630
[ 261.910112] ? netlink_unicast+0x350/0x350
[ 261.916410] ? netlink_unicast+0x350/0x350
[ 261.922656] sock_sendmsg+0x6d/0x80
[ 261.928257] ___sys_sendmsg+0x48e/0x540
[ 261.934183] ? copy_msghdr_from_user+0x210/0x210
[ 261.940865] ? save_stack+0x89/0xb0
[ 261.946355] ? __lock_acquire+0x588/0x1d20
[ 261.952358] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 261.959468] ? mark_held_locks+0x90/0x90
[ 261.965248] ? do_filp_open+0x138/0x1d0
[ 261.970910] ? may_open_dev+0x50/0x50
[ 261.976386] ? match_held_lock+0x1b/0x210
[ 261.982210] ? __fget_light+0xa6/0xe0
[ 261.987648] ? __sys_sendmsg+0xd2/0x150
[ 261.993263] __sys_sendmsg+0xd2/0x150
[ 261.998613] ? __ia32_sys_shutdown+0x30/0x30
[ 262.004555] ? lock_downgrade+0x2d0/0x2d0
[ 262.010236] ? mark_held_locks+0x1c/0x90
[ 262.015758] ? do_syscall_64+0x1e/0x280
[ 262.021234] do_syscall_64+0x78/0x280
[ 262.026500] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 262.033207] RIP: 0033:0x7f28e91a8b87
[ 262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[ 262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[ 262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[ 262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[ 262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[ 262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[ 262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas
[ 262.204393] CR2: 00000000000000b0
[ 262.210390] ---[ end trace 2e41d786f2c7901a ]---
[ 262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0
[ 262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[ 262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[ 262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[ 262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[ 262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[ 262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[ 262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[ 262.316139] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[ 262.327146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0

Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3df633a 25-Feb-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: act_tunnel_key: fix NULL pointer dereference during init

Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
it is unconditionally dereferenced in tunnel_key_init() error handler.
Verify that metadata pointer is not NULL before dereferencing it in
tunnel_key_init error handling code.

Fixes: ee28bb56ac5b ("net/sched: fix memory leak in act_tunnel_key_init()")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 41411e2f 22-Feb-2019 wenxu <wenxu@ucloud.cn>

net/sched: act_tunnel_key: Add dst_cache support

The metadata_dst is not init the dst_cache which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eddd2cf1 10-Feb-2019 Eli Cohen <eli@mellanox.com>

net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICE

Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
TCA_ID_POLICE and also differentiates these identifier, used in struct
tc_action_ops type field, from other macros starting with TCA_ACT_.

To make things clearer, we name the enum defining the TCA_ID_*
identifiers and also change the "type" field of struct tc_action to
id.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9174c3df 10-Jan-2019 Davide Caratti <dcaratti@redhat.com>

net/sched: act_tunnel_key: fix memory leak in case of action replace

running the following TDC test cases:

7afc - Replace tunnel_key set action with all parameters
364d - Replace tunnel_key set action with all parameters and cookie

it's possible to trigger kmemleak warnings like:

unreferenced object 0xffff94797127ab40 (size 192):
comm "tc", pid 3248, jiffies 4300565293 (age 1006.862s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 c0 93 f9 8a ff ff ff ff ................
41 84 ee 89 ff ff ff ff 00 00 00 00 00 00 00 00 A...............
backtrace:
[<000000001e85b61c>] tunnel_key_init+0x31d/0x820 [act_tunnel_key]
[<000000007f3f6ee7>] tcf_action_init_1+0x384/0x4c0
[<00000000e89e3ded>] tcf_action_init+0x12b/0x1a0
[<00000000c1c8c0f8>] tcf_action_add+0x73/0x170
[<0000000095a9fc28>] tc_ctl_action+0x122/0x160
[<000000004bebeac5>] rtnetlink_rcv_msg+0x263/0x2d0
[<000000009fd862dd>] netlink_rcv_skb+0x4a/0x110
[<00000000b55199e7>] netlink_unicast+0x1a0/0x250
[<000000004996cd21>] netlink_sendmsg+0x2c1/0x3c0
[<000000004d6a94b4>] sock_sendmsg+0x36/0x40
[<000000005d9f0208>] ___sys_sendmsg+0x280/0x2f0
[<00000000dec19023>] __sys_sendmsg+0x5e/0xa0
[<000000004b82ac81>] do_syscall_64+0x5b/0x180
[<00000000a0f1209a>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<000000002926b2ab>] 0xffffffffffffffff

when the tunnel_key action is replaced, the kernel forgets to release the
dst metadata: ensure they are released by tunnel_key_init(), the same way
it's done in tunnel_key_release().

Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c25324c 02-Dec-2018 Adi Nissim <adin@mellanox.com>

net/sched: act_tunnel_key: Don't dump dst port if it wasn't set

It's possible to set a tunnel without a destination port. However,
on dump(), a zero dst port is returned to user space even if it was not
set, fix that.

Note that so far it wasn't required, b/c key less tunnels were not
supported and the UDP tunnels do require destination port.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 80ef0f22 02-Dec-2018 Adi Nissim <adin@mellanox.com>

net/sched: act_tunnel_key: Allow key-less tunnels

Allow setting a tunnel without a tunnel key. This is required for
tunneling protocols, such as GRE, that define the key as an optional
field.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a162c351 06-Sep-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: properly cancel netlink dump on failure

When nla_put*() fails after nla_nest_start(), we need
to call nla_nest_cancel() to cancel the message, otherwise
we end up calling nla_nest_end() like a success.

Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee28bb56 04-Sep-2018 Davide Caratti <dcaratti@redhat.com>

net/sched: fix memory leak in act_tunnel_key_init()

If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:

# tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
# echo clear > /sys/kernel/debug/kmemleak
# tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
Error: TC IDR already exists.
We have an error talking to the kernel
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800574e6c80 (size 256):
comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................
81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$..............
backtrace:
[<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
[<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
[<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
[<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
[<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
[<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
[<00000000edab656f>] netlink_unicast+0x40f/0x5d0
[<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
[<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
[<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
[<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
[<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
[<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<00000000fac1b476>] 0xffffffffffffffff

This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.

Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f061b48c 29-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

Revert "net: sched: act: add extack for lookup callback"

This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback").

This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97a3f84f 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove unnecessary ops->delete()

All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.

More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().

So it can be just removed.

Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 653cd284 14-Aug-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: always disable bh when taking tcf_lock

Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.

Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:

[ 105.470398] ================================
[ 105.475014] WARNING: inconsistent lock state
[ 105.479628] 4.18.0-rc8+ #664 Not tainted
[ 105.483897] --------------------------------
[ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[ 105.509696] {SOFTIRQ-ON-W} state was registered at:
[ 105.514925] _raw_spin_lock+0x2c/0x40
[ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
[ 105.523990] tcf_action_init_1+0x4e4/0x660
[ 105.528518] tcf_action_init+0x1ce/0x2d0
[ 105.532880] tcf_exts_validate+0x1d8/0x200
[ 105.537416] fl_change+0x55a/0x268b [cls_flower]
[ 105.542469] tc_new_tfilter+0x748/0xa20
[ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
[ 105.551268] netlink_rcv_skb+0x18d/0x200
[ 105.555628] netlink_unicast+0x2d0/0x370
[ 105.559990] netlink_sendmsg+0x3b9/0x6a0
[ 105.564349] sock_sendmsg+0x6b/0x80
[ 105.568271] ___sys_sendmsg+0x4a1/0x520
[ 105.572547] __sys_sendmsg+0xd7/0x150
[ 105.576655] do_syscall_64+0x72/0x2c0
[ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.586243] irq event stamp: 489296
[ 105.590084] hardirqs last enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
[ 105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
[ 105.609277] softirqs last enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
[ 105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
[ 105.626813]
other info that might help us debug this:
[ 105.633976] Possible unsafe locking scenario:

[ 105.640526] CPU0
[ 105.643325] ----
[ 105.646125] lock(&(&p->tcfa_lock)->rlock);
[ 105.650747] <Interrupt>
[ 105.653717] lock(&(&p->tcfa_lock)->rlock);
[ 105.658514]
*** DEADLOCK ***

[ 105.665349] 1 lock held by swapper/16/0:
[ 105.669629] #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[ 105.678200]
stack backtrace:
[ 105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[ 105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 105.698626] Call Trace:
[ 105.701421] <IRQ>
[ 105.703791] dump_stack+0x92/0xeb
[ 105.707461] print_usage_bug+0x336/0x34c
[ 105.711744] mark_lock+0x7c9/0x980
[ 105.715500] ? print_shortest_lock_dependencies+0x2e0/0x2e0
[ 105.721424] ? check_usage_forwards+0x230/0x230
[ 105.726315] __lock_acquire+0x923/0x26f0
[ 105.730597] ? debug_show_all_locks+0x240/0x240
[ 105.735478] ? mark_lock+0x493/0x980
[ 105.739412] ? check_chain_key+0x140/0x1f0
[ 105.743861] ? __lock_acquire+0x836/0x26f0
[ 105.748323] ? lock_acquire+0x12e/0x290
[ 105.752516] lock_acquire+0x12e/0x290
[ 105.756539] ? est_fetch_counters+0x3c/0xa0
[ 105.761084] _raw_spin_lock+0x2c/0x40
[ 105.765099] ? est_fetch_counters+0x3c/0xa0
[ 105.769633] est_fetch_counters+0x3c/0xa0
[ 105.773995] est_timer+0x87/0x390
[ 105.777670] ? est_fetch_counters+0xa0/0xa0
[ 105.782210] ? lock_acquire+0x12e/0x290
[ 105.786410] call_timer_fn+0x161/0x550
[ 105.790512] ? est_fetch_counters+0xa0/0xa0
[ 105.795055] ? del_timer_sync+0xd0/0xd0
[ 105.799249] ? __lock_is_held+0x93/0x110
[ 105.803531] ? mark_held_locks+0x20/0xe0
[ 105.807813] ? _raw_spin_unlock_irq+0x29/0x40
[ 105.812525] ? est_fetch_counters+0xa0/0xa0
[ 105.817069] ? est_fetch_counters+0xa0/0xa0
[ 105.821610] run_timer_softirq+0x3c4/0x9f0
[ 105.826064] ? lock_acquire+0x12e/0x290
[ 105.830257] ? __bpf_trace_timer_class+0x10/0x10
[ 105.835237] ? __lock_is_held+0x25/0x110
[ 105.839517] __do_softirq+0x11d/0x7bf
[ 105.843542] irq_exit+0x140/0x190
[ 105.847208] smp_apic_timer_interrupt+0xac/0x3b0
[ 105.852182] apic_timer_interrupt+0xf/0x20
[ 105.856628] </IRQ>
[ 105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[ 105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[ 105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[ 105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[ 105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[ 105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[ 105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[ 105.929988] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.935232] do_idle+0x28e/0x320
[ 105.938817] ? arch_cpu_idle_exit+0x40/0x40
[ 105.943361] ? mark_lock+0x8c1/0x980
[ 105.947295] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.952619] cpu_startup_entry+0xc2/0xd0
[ 105.956900] ? cpu_in_idle+0x20/0x20
[ 105.960830] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.966146] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.971391] start_secondary+0x2b5/0x360
[ 105.975669] ? set_cpu_sibling_map+0x1330/0x1330
[ 105.980654] secondary_startup_64+0xa5/0xb0

Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:

[ 162.108959] Possible interrupt unsafe locking scenario:

[ 162.116386] CPU0 CPU1
[ 162.121277] ---- ----
[ 162.126162] lock(psample_groups_lock);
[ 162.130447] local_irq_disable();
[ 162.136772] lock(&(&p->tcfa_lock)->rlock);
[ 162.143957] lock(psample_groups_lock);
[ 162.150813] <Interrupt>
[ 162.153808] lock(&(&p->tcfa_lock)->rlock);
[ 162.158608]
*** DEADLOCK ***

In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.

Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 729e0126 10-Aug-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: act_tunnel_key: remove dependency on rtnl lock

Use tcf lock to protect tunnel key action struct private data from
concurrent modification in init and dump. Use rcu swap operation to
reassign params pointer under protection of tcf lock. (old params value is
not used by init, so there is no need of standalone rcu dereference step)

Remove rtnl lock assertion that is no longer required.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7fd4b288 30-Jul-2018 Paolo Abeni <pabeni@redhat.com>

tc/act: remove unneeded RCU lock in action callback

Each lockless action currently does its own RCU locking in ->act().
This allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with the _bh variant, cleans up a bit the surrounding code and
documents the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -> v2:
- preserve rcu lock in act_bpf: it's needed by eBPF helpers,
as pointed out by Daniel

v3 -> v4:
- fixed some typos in the commit message (JiriP)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 07a557f4 17-Jul-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/sched: tunnel_key: Allow to set tos and ttl for tc based ip tunnels

Allow user-space to provide tos and ttl to be set for the tunnel headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0190c1d4 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: atomically check-allocate action

Implement function that atomically checks if action exists and either takes
reference to it, or allocates idr slot for action index to prevent
concurrent allocations of actions with same index. Use EBUSY error pointer
to indicate that idr slot is reserved.

Implement cleanup helper function that removes temporary error pointer from
idr. (in case of error between idr allocation and insertion of newly
created action to specified index)

Refactor all action init functions to insert new action to idr using this
API.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e8ddd7f 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: don't release reference on action overwrite

Return from action init function with reference to action taken,
even when overwriting existing action.

Action init API initializes its fourth argument (pointer to pointer to tc
action) to either existing action with same index or newly created action.
In case of existing index(and bind argument is zero), init function returns
without incrementing action reference counter. Caller of action init then
proceeds working with action, without actually holding reference to it.
This means that action could be deleted concurrently.

Change action init behavior to always take reference to action before
returning successfully, in order to protect from concurrent deletion.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b409074e 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: add 'delete' function to action ops

Extend action ops with 'delete' function. Each action type to implements
its own delete function that doesn't depend on rtnl lock.

Implement delete function that is required to delete actions without
holding rtnl lock. Use action API function that atomically deletes action
only if it is still in action idr. This implementation prevents concurrent
threads from deleting same action twice.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 789871bb 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: implement unlocked action init API

Add additional 'rtnl_held' argument to act API init functions. It is
required to implement actions that need to release rtnl lock before loading
kernel module and reacquire if afterwards.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 036bb443 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: change type of reference and bind counters

Change type of action reference counter to refcount_t.

Change type of action bind counter to atomic_t.
This type is used to allow decrementing bind counter without testing
for 0 result.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 38230a3e 06-Jul-2018 Davide Caratti <dcaratti@redhat.com>

net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used

the control action in the common member of struct tcf_tunnel_key must be a
valid value, as it can contain the chain index when 'goto chain' is used.
Ensure that the control action can be read as x->tcfa_action, when x is a
pointer to struct tc_action and x->ops->type is TCA_ACT_TUNNEL_KEY, to
prevent the following command:

# tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
> $tcflags dst_mac $h2mac action tunnel_key unset goto chain 1

from causing a NULL dereference when a matching packet is received:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 80000001097ac067 P4D 80000001097ac067 PUD 103b0a067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 3491 Comm: mausezahn Tainted: G E 4.18.0-rc2.auguri+ #421
Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
FS: 00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
Call Trace:
<IRQ>
fl_classify+0x1ad/0x1c0 [cls_flower]
? __update_load_avg_se.isra.47+0x1ca/0x1d0
? __update_load_avg_se.isra.47+0x1ca/0x1d0
? update_load_avg+0x665/0x690
? update_load_avg+0x665/0x690
? kmem_cache_alloc+0x38/0x1c0
tcf_classify+0x89/0x140
__netif_receive_skb_core+0x5ea/0xb70
? enqueue_entity+0xd0/0x270
? process_backlog+0x97/0x150
process_backlog+0x97/0x150
net_rx_action+0x14b/0x3e0
__do_softirq+0xde/0x2b4
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.18+0x49/0x50
__local_bh_enable_ip+0x49/0x50
__dev_queue_xmit+0x4ab/0x8a0
? wait_woken+0x80/0x80
? packet_sendmsg+0x38f/0x810
? __dev_queue_xmit+0x8a0/0x8a0
packet_sendmsg+0x38f/0x810
sock_sendmsg+0x36/0x40
__sys_sendto+0x10e/0x140
? do_vfs_ioctl+0xa4/0x630
? syscall_trace_enter+0x1df/0x2e0
? __audit_syscall_exit+0x22a/0x290
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fd67e18dc93
Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
RSP: 002b:00007ffe0189b748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000020ca010 RCX: 00007fd67e18dc93
RDX: 0000000000000062 RSI: 00000000020ca322 RDI: 0000000000000003
RBP: 00007ffe0189b780 R08: 00007ffe0189b760 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
R13: 00000000020ca322 R14: 00007ffe0189b760 R15: 0000000000000003
Modules linked in: act_tunnel_key act_gact cls_flower sch_ingress vrf veth act_csum(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul crc32_pclmul hp_wmi ghash_clmulni_intel pcbc snd_hda_codec aesni_intel sparse_keymap rfkill snd_hda_core snd_hwdep snd_seq crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support wmi_bmof cryptd mei_wdt glue_helper snd_seq_device snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich sg soundcore wmi mei_me
mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_intel libahci serio_raw sfc libata mtd drm ixgbe mdio i2c_core e1000e dca
CR2: 0000000000000000
---[ end trace 1ab8b5b5d4639dfc ]---
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
FS: 00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0ed5269f 26-Jun-2018 Simon Horman <simon.horman@netronome.com>

net/sched: add tunnel option support to act_tunnel_key

Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
geneve_opts 0102:80:00800022,0102:80:00800022 \
action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d7298cd 26-Jun-2018 Simon Horman <simon.horman@netronome.com>

net/sched: act_tunnel_key: add extended ack support

Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a1165b59 26-Jun-2018 Simon Horman <simon.horman@netronome.com>

net/sched: act_tunnel_key: disambiguate metadata dst error cases

Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst

Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.

This is in preparation for using extended ack to provide more information
to users when parsing their input.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# abdadd3c 15-Mar-2018 Davide Caratti <dcaratti@redhat.com>

net/sched: fix NULL dereference in the error path of tunnel_key_init()

when the following command

# tc action add action tunnel_key unset index 100

is run for the first time, and tunnel_key_init() fails to allocate struct
tcf_tunnel_key_params, tunnel_key_release() dereferences NULL pointers.
This causes the following error:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: tunnel_key_release+0xd/0x40 [act_tunnel_key]
PGD 8000000033787067 P4D 8000000033787067 PUD 74646067 PMD 0
Oops: 0000 [#1] SMP PTI
Modules linked in: act_tunnel_key(E) act_csum ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel pcbc snd_hda_codec snd_hda_core snd_hwdep snd_seq aesni_intel snd_seq_device crypto_simd glue_helper snd_pcm cryptd joydev snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk drm virtio_console crc32c_intel ata_piix serio_raw i2c_core virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 3101 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:tunnel_key_release+0xd/0x40 [act_tunnel_key]
RSP: 0018:ffffba46803b7768 EFLAGS: 00010286
RAX: ffffffffc09010a0 RBX: 0000000000000000 RCX: 0000000000000024
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99ee336d7480
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
R10: 0000000000000220 R11: ffff99ee79d73131 R12: 0000000000000000
R13: ffff99ee32d67610 R14: ffff99ee7671dc38 R15: 00000000fffffff4
FS: 00007febcb2cd740(0000) GS:ffff99ee7fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000007c8e4005 CR4: 00000000001606e0
Call Trace:
__tcf_idr_release+0x79/0xf0
tunnel_key_init+0xd9/0x460 [act_tunnel_key]
tcf_action_init_1+0x2cc/0x430
tcf_action_init+0xd3/0x1b0
tc_ctl_action+0x18b/0x240
rtnetlink_rcv_msg+0x29c/0x310
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1b9/0x270
? rtnl_calcit.isra.28+0x100/0x100
netlink_rcv_skb+0xd2/0x110
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2cd/0x3c0
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x27a/0x290
__sys_sendmsg+0x51/0x90
do_syscall_64+0x6e/0x1a0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7febca6deba0
RSP: 002b:00007ffe7b0dd128 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffe7b0dd250 RCX: 00007febca6deba0
RDX: 0000000000000000 RSI: 00007ffe7b0dd1a0 RDI: 0000000000000003
RBP: 000000005aaa90cb R08: 0000000000000002 R09: 0000000000000000
R10: 00007ffe7b0dcba0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe7b0dd264 R14: 0000000000000001 R15: 0000000000669f60
Code: 44 00 00 8b 0d b5 23 00 00 48 8b 87 48 10 00 00 48 8b 3c c8 e9 a5 e5 d8 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 9f b0 00 00 00 <83> 7b 10 01 74 0b 48 89 df 31 f6 5b e9 f2 fa 7f c3 48 8b 7b 18
RIP: tunnel_key_release+0xd/0x40 [act_tunnel_key] RSP: ffffba46803b7768
CR2: 0000000000000010

Fix this in tunnel_key_release(), ensuring 'param' is not NULL before
dereferencing it.

Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 51d4740f 12-Mar-2018 Roman Mashak <mrv@mojatatu.com>

net sched actions: return explicit error when tunnel_key mode is not specified

If set/unset mode of the tunnel_key action is not provided, ->init() still
returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
this results in crash:

% tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1

[ 35.805515] general protection fault: 0000 [#1] SMP PTI
[ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
crypto_simd glue_helper cryptd serio_raw
[ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
[ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190
[ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
[ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
[ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
[ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
[ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
[ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
[ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
knlGS:0000000000000000
[ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
[ 35.816457] Call Trace:
[ 35.817158] tc_ctl_action+0x11a/0x220
[ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0
[ 35.818457] ? __slab_alloc+0x1c/0x30
[ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0
[ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0
[ 35.820231] netlink_rcv_skb+0xce/0x100
[ 35.820744] netlink_unicast+0x164/0x220
[ 35.821500] netlink_sendmsg+0x293/0x370
[ 35.822040] sock_sendmsg+0x30/0x40
[ 35.822508] ___sys_sendmsg+0x2c5/0x2e0
[ 35.823149] ? pagecache_get_page+0x27/0x220
[ 35.823714] ? filemap_fault+0xa2/0x640
[ 35.824423] ? page_add_file_rmap+0x108/0x200
[ 35.825065] ? alloc_set_pte+0x2aa/0x530
[ 35.825585] ? finish_fault+0x4e/0x70
[ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0
[ 35.826723] ? __sys_sendmsg+0x41/0x70
[ 35.827230] __sys_sendmsg+0x41/0x70
[ 35.827710] do_syscall_64+0x68/0x120
[ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 35.828859] RIP: 0033:0x7f3d0ca4da67
[ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
[ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
[ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
[ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
[ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
[ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48
8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
[ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
[ 35.838291] ---[ end trace a095c06ee4b97a26 ]---

Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 685ecfb1 26-Feb-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations

These pernet_operations are from net/sched directory, and they call only
tc_action_net_init() and tc_action_net_exit():

bpf_net_ops
connmark_net_ops
csum_net_ops
gact_net_ops
ife_net_ops
ipt_net_ops
xt_net_ops
mirred_net_ops
nat_net_ops
pedit_net_ops
police_net_ops
sample_net_ops
simp_net_ops
skbedit_net_ops
skbmod_net_ops
tunnel_key_net_ops
vlan_net_ops

1)tc_action_net_init() just allocates and initializes per-net memory.
2)There should not be in-flight packets at the time of tc_action_net_exit()
call, or another pernet_operations send packets to dying net (except
netlink). So, it seems they can be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3620145 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: handle extack in tcf_generic_walker

This patch adds extack handling for a common used TC act function
"tcf_generic_walker()" to add an extack message on failures.
The tcf_generic_walker() function can fail if get a invalid command
different than DEL and GET. The naming "action" here is wrong, the
correct naming would be command.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 41780105 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack for walk callback

This patch adds extack support for act walker callback api. This
prepares to handle extack support inside each specific act
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 331a9295 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack for lookup callback

This patch adds extack support for act lookup callback api. This
prepares to handle extack support inside each specific act
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 589dad6d 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack to init callback

This patch adds extack support for act init callback api. This
prepares to handle extack support inside each specific act
implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 039af9c6 11-Dec-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: switch to exit_batch for action pernet ops

Since we now hold RTNL lock in tc_action_net_exit(), it is good to
batch them to speedup tc action dismantle.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9a63b255 05-Dec-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove unused parameter from act cleanup ops

No one actually uses it.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7e460ce 06-Nov-2017 Cong Wang <xiyou.wangcong@gmail.com>

Revert "net_sched: hold netns refcnt for each action"

This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
If we hold that refcnt, the netns can never be destroyed until
all actions are destroyed by user, this breaks our netns design
which we expect all actions are destroyed when we destroy the
whole netns.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ceffcc5e 01-Nov-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: hold netns refcnt for each action

TC actions have been destroyed asynchronously for a long time,
previously in a RCU callback and now in a workqueue. If we
don't hold a refcnt for its netns, we could use the per netns
data structure, struct tcf_idrinfo, after it has been freed by
netns workqueue.

Hold refcnt to ensure netns destroy happens after all actions
are gone.

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65a206c0 30-Aug-2017 Chris Mi <chrism@mellanox.com>

net/sched: Change act_api and act_xxx modules to use IDR

Typically, each TC filter has its own action. All the actions of the
same type are saved in its hash table. But the hash buckets are too
small that it degrades to a list. And the performance is greatly
affected. For example, it takes about 0m11.914s to insert 64K rules.
If we convert the hash table to IDR, it only takes about 0m1.500s.
The improvement is huge.

But please note that the test result is based on previous patch that
cls_flower uses IDR.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86087e17 14-Jun-2017 Jiri Benc <jbenc@redhat.com>

net: sched: act_tunnel_key: make UDP checksum configurable

Allow requesting of zero UDP checksum for encapsulated packets. The name and
meaning of the attribute is "NO_CSUM" in order to have the same meaning of
the attribute missing and being 0.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63fe4c39 14-Jun-2017 Jiri Benc <jbenc@redhat.com>

net: sched: act_tunnel_key: request UDP checksum by default

There's currently no way to request (outer) UDP checksum with
act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key
action with IPv6 does not work without going through hassles: both sides
have to have udp6zerocsumrx configured on the tunnel interface. This is
obviously not a good solution universally.

It makes more sense to compute the UDP checksum by default even for IPv4.
Just set the default to request the checksum when using act_tunnel_key.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dc594ecd 22-Dec-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6

The UDP dst port was provided to the helper function which sets the
IPv6 IP tunnel meta-data under a wrong param order, fix that.

Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7d03a00 16-Nov-2016 Alexey Dobriyan <adobriyan@gmail.com>

netns: make struct pernet_operations::id unsigned int

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75bfbca0 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/sched: act_tunnel_key: Add UDP dst port option

The current tunnel set action supports only IP addresses and key
options. Add UDP dst port option.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24ba898d 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/dst: Add dst port to dst_metadata utility functions

Add dst port parameter to __ip_tun_set_dst and __ipv6_tun_set_dst
utility functions.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9ce183b4 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/sched: act_tunnel_key: add helper inlines to access tcf_tunnel_key

Needed for drivers to pick the relevant action when offloading tunnel
key act.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 07c0f09e 12-Sep-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/sched: act_tunnel_key: Remove rcu_read_lock protection

Remove rcu_read_lock protection from tunnel_key_dump and use
rtnl_dereference, dump operation is protected by rtnl lock.

Also, remove rcu_read_lock from tunnel_key_release and use
rcu_dereference_protected.

Both operations are running exclusively and a writer couldn't modify
t->params while those functions are executed.

Fixes: 54d94fd89d90 ('net/sched: Introduce act_tunnel_key')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d0f6dd8a 08-Sep-2016 Amir Vadai <amir@vadai.me>

net/sched: Introduce act_tunnel_key

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>