#
47a92dfb |
|
21-Mar-2024 |
Suren Baghdasaryan <surenb@google.com> |
lib: prevent module unloading if memory is not freed Skip freeing module's data section if there are non-zero allocation tags because otherwise, once these allocations are freed, the access to their code tag would cause UAF. Link: https://lkml.kernel.org/r/20240321163705.3067592-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
a4735739 |
|
21-Mar-2024 |
Suren Baghdasaryan <surenb@google.com> |
lib: code tagging module support Add support for code tagging from dynamically loaded modules. Link: https://lkml.kernel.org/r/20240321163705.3067592-12-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
223b5e57 |
|
05-May-2024 |
Mike Rapoport (IBM) <rppt@kernel.org> |
mm/execmem, arch: convert remaining overrides of module_alloc to execmem Extend execmem parameters to accommodate more complex overrides of module_alloc() by architectures. This includes specification of a fallback range required by arm, arm64 and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for allocation of KASAN shadow required by s390 and x86 and support for late initialization of execmem required by arm64. The core implementation of execmem_alloc() takes care of suppressing warnings when the initial allocation fails but there is a fallback range defined. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Song Liu <song@kernel.org> Tested-by: Liviu Dudau <liviu@dudau.co.uk> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
12af2b83 |
|
05-May-2024 |
Mike Rapoport (IBM) <rppt@kernel.org> |
mm: introduce execmem_alloc() and execmem_free() module_alloc() is used everywhere as a mean to allocate memory for code. Beside being semantically wrong, this unnecessarily ties all subsystems that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code. Several architectures override module_alloc() because of various constraints where the executable memory can be located and this causes additional obstacles for improvements of code allocation. Start splitting code allocation from modules by introducing execmem_alloc() and execmem_free() APIs. Initially, execmem_alloc() is a wrapper for module_alloc() and execmem_free() is a replacement of module_memfree() to allow updating all call sites to use the new APIs. Since architectures define different restrictions on placement, permissions, alignment and other parameters for memory that can be used by different subsystems that allocate executable memory, execmem_alloc() takes a type argument, that will be used to identify the calling subsystem and to allow architectures define parameters for ranges suitable for that subsystem. No functional changes. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
bc6b94d3 |
|
05-May-2024 |
Mike Rapoport (IBM) <rppt@kernel.org> |
module: make module_memory_{alloc,free} more self-contained Move the logic related to the memory allocation and freeing into module_memory_alloc() and module_memory_free(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
8f8cd6c0 |
|
26-Feb-2024 |
Changbin Du <changbin.du@huawei.com> |
modules: wait do_free_init correctly The synchronization here is to ensure the ordering of freeing of a module init so that it happens before W+X checking. It is worth noting it is not that the freeing was not happening, it is just that our sanity checkers raced against the permission checkers which assume init memory is already gone. Commit 1a7b7d922081 ("modules: Use vmalloc special flag") moved calling do_free_init() into a global workqueue instead of relying on it being called through call_rcu(..., do_free_init), which used to allowed us call do_free_init() asynchronously after the end of a subsequent grace period. The move to a global workqueue broke the gaurantees for code which needed to be sure the do_free_init() would complete with rcu_barrier(). To fix this callers which used to rely on rcu_barrier() must now instead use flush_work(&init_free_wq). Without this fix, we still could encounter false positive reports in W+X checking since the rcu_barrier() here can not ensure the ordering now. Even worse, the rcu_barrier() can introduce significant delay. Eric Chanudet reported that the rcu_barrier introduces ~0.1s delay on a PREEMPT_RT kernel. [ 0.291444] Freeing unused kernel memory: 5568K [ 0.402442] Run /sbin/init as init process With this fix, the above delay can be eliminated. Link: https://lkml.kernel.org/r/20240227023546.2490667-1-changbin.du@huawei.com Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag") Signed-off-by: Changbin Du <changbin.du@huawei.com> Tested-by: Eric Chanudet <echanude@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Xiaoyi Su <suxiaoyi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
d1909c02 |
|
16-Feb-2024 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Don't ignore errors from set_memory_XX() set_memory_ro(), set_memory_nx(), set_memory_x() and other helpers can fail and return an error. In that case the memory might not be protected as expected and the module loading has to be aborted to avoid security issues. Check return value of all calls to set_memory_XX() and handle error if any. Add a check to not call set_memory_XX() on NULL pointers as some architectures may not like it allthough numpages is always 0 in that case. This also avoid a useless call to set_vm_flush_reset_perms(). Link: https://github.com/KSPP/linux/issues/7 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
3559ad39 |
|
21-Dec-2023 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Change module_enable_{nx/x/ro}() to more explicit names It's a bit puzzling to see a call to module_enable_nx() followed by a call to module_enable_x(). This is because one applies on text while the other applies on data. Change name to make that more clear. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ac88ee7d |
|
21-Dec-2023 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Use set_memory_rox() A couple of architectures seem concerned about calling set_memory_ro() and set_memory_x() too frequently and have implemented a version of set_memory_rox(), see commit 60463628c9e0 ("x86/mm: Implement native set_memory_rox()") and commit 22e99fa56443 ("s390/mm: implement set_memory_rox()") Use set_memory_rox() in modules when STRICT_MODULES_RWX is set. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
d81f0d7b |
|
13-Dec-2023 |
Rae Moar <rmoar@google.com> |
kunit: add KUNIT_INIT_TABLE to init linker section Add KUNIT_INIT_TABLE to the INIT_DATA linker section. Alter the KUnit macros to create init tests: kunit_test_init_section_suites Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and KUNIT_INIT_TABLE. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
#
2abcc4b5 |
|
01-Aug-2023 |
James Morse <james.morse@arm.com> |
module: Expose module_init_layout_section() module_init_layout_section() choses whether the core module loader considers a section as init or not. This affects the placement of the exit section when module unloading is disabled. This code will never run, so it can be free()d once the module has been initialised. arm and arm64 need to count the number of PLTs they need before applying relocations based on the section name. The init PLTs are stored separately so they can be free()d. arm and arm64 both use within_module_init() to decide which list of PLTs to use when applying the relocation. Because within_module_init()'s behaviour changes when module unloading is disabled, both architecture would need to take this into account when counting the PLTs. Today neither architecture does this, meaning when module unloading is disabled there are insufficient PLTs in the init section to load some modules, resulting in warnings: | WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc | Modules linked in: crct10dif_common | CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208 | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 | pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : module_emit_plt_entry+0x184/0x1cc | lr : module_emit_plt_entry+0x94/0x1cc | sp : ffffffc0803bba60 [...] | Call trace: | module_emit_plt_entry+0x184/0x1cc | apply_relocate_add+0x2bc/0x8e4 | load_module+0xe34/0x1bd4 | init_module_from_file+0x84/0xc0 | __arm64_sys_finit_module+0x1b8/0x27c | invoke_syscall.constprop.0+0x5c/0x104 | do_el0_svc+0x58/0x160 | el0_svc+0x38/0x110 | el0t_64_sync_handler+0xc0/0xc4 | el0t_64_sync+0x190/0x194 Instead of duplicating module_init_layout_section()s logic, expose it. Reported-by: Adam Johnston <adam.johnston@arm.com> Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: stable@vger.kernel.org Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
9011e49d |
|
01-Aug-2023 |
Christoph Hellwig <hch@lst.de> |
modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules It has recently come to my attention that nvidia is circumventing the protection added in 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE") by importing exports from their proprietary modules into an allegedly GPL licensed module and then rexporting them. Given that symbol_get was only ever intended for tightly cooperating modules using very internal symbols it is logical to restrict it to being used on EXPORT_SYMBOL_GPL and prevent nvidia from costly DMCA Circumvention of Access Controls law suites. All symbols except for four used through symbol_get were already exported as EXPORT_SYMBOL_GPL, and the remaining four ones were switched over in the preparation patches. Fixes: 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
f1962207 |
|
04-Jul-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
module: fix init_module_from_file() error handling Vegard Nossum pointed out two different problems with the error handling in init_module_from_file(): (a) the idempotent loading code didn't clean up properly in some error cases, leaving the on-stack 'struct idempotent' element still in the hash table (b) failure to read the module file would nonsensically update the 'invalid_kread_bytes' stat counter with the error value The first error is quite nasty, in that it can then cause subsequent idempotent loads of that same file to access stale stack contents of the previous failure. The case may not happen in any normal situation (explaining all the "Tested-by's on the original change), and requires admin privileges, but syzkaller triggers random bad behavior as a result: BUG: soft lockup in sys_finit_module BUG: unable to handle kernel paging request in init_module_from_file general protection fault in init_module_from_file INFO: task hung in init_module_from_file KASAN: out-of-bounds Read in init_module_from_file KASAN: slab-out-of-bounds Read in init_module_from_file ... The second error is fairly benign and just leads to nonsensical stats (and has been around since the debug stats were added). Vegard also provided a patch for the idempotent loading issue, but I'd rather re-organize the code and make it more legible using another level of helper functions than add the usual "goto out" error handling. Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/ Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reported-by: syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9b9879fc |
|
29-May-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
modules: catch concurrent module loads, treat them as idempotent This is the new-and-improved attempt at avoiding huge memory load spikes when the user space boot sequence tries to load hundreds (or even thousands) of redundant duplicate modules in parallel. See commit 9828ed3f695a ("module: error out early on concurrent load of the same module file") for background and an earlier failed attempt that was reverted. That earlier attempt just said "concurrently loading the same module is silly, just open the module file exclusively and return -ETXTBSY if somebody else is already loading it". While it is true that concurrent module loads of the same module is silly, the reason that earlier attempt then failed was that the concurrently loaded module would often be a prerequisite for another module. Thus failing to load the prerequisite would then cause cascading failures of the other modules, rather than just short-circuiting that one unnecessary module load. At the same time, we still really don't want to load the contents of the same module file hundreds of times, only to then wait for an eventually successful load, and have everybody else return -EEXIST. As a result, this takes another approach, and treats concurrent module loads from the same file as "idempotent" in the inode. So if one module load is ongoing, we don't start a new one, but instead just wait for the first one to complete and return the same return value as it did. So unlike the first attempt, this does not return early: the intent is not to speed up the boot, but to avoid a thundering herd problem in allocating memory (both physical and virtual) for a module more than once. Also note that this does change behavior: it used to be that when you had concurrent loads, you'd have one "winner" that would return success, and everybody else would return -EEXIST. In contrast, this idempotent logic goes all Oprah on the problem, and says "You are a winner! And you are a winner! We are ALL winners". But since there's no possible actual real semantic difference between "you loaded the module" and "somebody else already loaded the module", this is more of a feel-good change than an actual honest-to-goodness semantic change. Of course, any true Johnny-come-latelies that don't get caught in the concurrency filter will still return -EEXIST. It's no different from not even getting a seat at an Oprah taping. That's life. See the long thread on the kernel mailing list about this all, which includes some numbers for memory use before and after the patch. Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/ Reviewed-by: Johan Hovold <johan@kernel.org> Tested-by: Johan Hovold <johan@kernel.org> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Rudi Heitbaum <rudi@heitbaum..com> Tested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
054a7300 |
|
29-May-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
module: split up 'finit_module()' into init_module_from_file() helper This will simplify the next step, where we can then key off the inode to do one idempotent module load. Let's do the obvious re-organization in one step, and then the new code in another. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
db3e33dd |
|
28-May-2023 |
Song Liu <song@kernel.org> |
module: fix module load for ia64 Frank reported boot regression in ia64 as: ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [ 0.000000] efi: EFI v1.1 by HP [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [ 0.000000] PCDP: v3 at 0x3fe28000 [ 0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options '9600n8') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP ) [ 0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP rx2620 00000000 HP 00000000) [...] [ 3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [ 3.951100] ------------[ cut here ]------------ [ 3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [ 3.949512] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.951100] Modules linked in: [ 3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [ 3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [ 3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 3.951774] [ 3.951774] Call Trace: [ 3.958339] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.956161] Modules linked in: [ 3.951774] [<a0000001000156d0>] show_stack.part.0+0x30/0x60 [ 3.951774] sp=e000000183a67b20 bsp=e000000183a61628 [ 3.956161] [ 3.956161] which bisect to module_memory change [1]. Debug showed that ia64 uses some special sections: __layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID All these sections are loaded to module core memory before [1]. Fix ia64 boot by loading these sections to MOD_DATA (core rw data). [1] commit ac3b43283923 ("module: replace module_layout with module_memory") Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Frank Scheiner <frank.scheiner@web.de> Closes: https://lists.debian.org/debian-ia64/2023/05/msg00010.html Closes: https://marc.info/?l=linux-ia64&m=168509859125505 Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ac2263b5 |
|
29-May-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "module: error out early on concurrent load of the same module file" This reverts commit 9828ed3f695a138f7add89fa2a186ababceb8006. Sadly, it does seem to cause failures to load modules. Johan Hovold reports: "This change breaks module loading during boot on the Lenovo Thinkpad X13s (aarch64). Specifically it results in indefinite probe deferral of the display and USB (ethernet) which makes it a pain to debug. Typing in the dark to acquire some logs reveals that other modules are missing as well" Since this was applied late as a "let's try this", I'm reverting it asap, and we can try to figure out what goes wrong later. The excessive parallel module loading problem is annoying, but not noticeable in normal situations, and this was only meant as an optimistic workaround for a user-space bug. One possible solution may be to do the optimistic exclusive open first, and then use a lock to serialize loading if that fails. Reported-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9828ed3f |
|
25-May-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
module: error out early on concurrent load of the same module file It turns out that udev under certain circumstances will concurrently try to load the same modules over-and-over excessively. This isn't a kernel bug, but it ends up affecting the kernel, to the point that under certain circumstances we can fail to boot, because the kernel uses a lot of memory to read all the module data all at once. Note that it isn't a memory leak, it's just basically a thundering herd problem happening at bootup with a lot of CPUs, with the worst cases then being pretty bad. Admittedly the worst situations are somewhat contrived: lots and lots of CPUs, not a lot of memory, and KASAN enabled to make it all slower and as such (unintentionally) exacerbate the problem. Luis explains: [1] "My best assessment of the situation is that each CPU in udev ends up triggering a load of duplicate set of modules, not just one, but *a lot*. Not sure what heuristics udev uses to load a set of modules per CPU." Petr Pavlu chimes in: [2] "My understanding is that udev workers are forked. An initial kmod context is created by the main udevd process but no sharing happens after the fork. It means that the mentioned memory pool logic doesn't really kick in. Multiple parallel load requests come from multiple udev workers, for instance, each handling an udev event for one CPU device and making the exactly same requests as all others are doing at the same time. The optimization idea would be to recognize these duplicate requests at the udevd/kmod level and converge them" Note that module loading has tried to mitigate this issue before, see for example commit 064f4536d139 ("module: avoid allocation if module is already present and ready"), which has a few ASCII graphs on memory use due to this same issue. However, while that noticed that the module was already loaded, and exited with an error early before spending any more time on setting up the module, it didn't handle the case of multiple concurrent module loads all being active - but not complete - at the same time. Yes, one of them will eventually win the race and finalize its copy, and the others will then notice that the module already exists and error out, but while this all happens, we have tons of unnecessary concurrent work being done. Again, the real fix is for udev to not do that (maybe it should use threads instead of fork, and have actual shared data structures and not cause duplicate work). That real fix is apparently not trivial. But it turns out that the kernel already has a pretty good model for dealing with concurrent access to the same file: the i_writecount of the inode. In fact, the module loading already indirectly uses 'i_writecount' , because 'kernel_file_read()' will in fact do ret = deny_write_access(file); if (ret) return ret; ... allow_write_access(file); around the read of the file data. We do not allow concurrent writes to the file, and return -ETXTBUSY if the file was open for writing at the same time as the module data is loaded from it. And the solution to the reader concurrency problem is to simply extend this "no concurrent writers" logic to simply be "exclusive access". Note that "exclusive" in this context isn't really some absolute thing: it's only exclusion from writers and from other "special readers" that do this writer denial. So we simply introduce a variation of that "deny_write_access()" logic that not only denies write access, but also requires that this is the _only_ such access that denies write access. Which means that you can't start loading a module that is already being loaded as a module by somebody else, or you will get the same -ETXTBSY error that you would get if there were writers around. [ It also means that you can't try to load a currently executing executable as a module, for the same reason: executables do that same "deny_write_access()" thing, and that's obviously where the whole ETXTBSY logic traditionally came from. This is not a problem for kernel modules, since the set of normal executable files and kernel module files is entirely disjoint. ] This new function is called "exclusive_deny_write_access()", and the implementation is trivial, in that it's just an atomic decrement of i_writecount if it was 0 before. To use that new exclusivity check, all we then do is wrap the module loading with that exclusive_deny_write_access()() / allow_write_access() pair. The actual patch is a bit bigger than that, because we want to surround not just the "load file data" part, but the whole module setup, to get maximum exclusion. So this ends up splitting up "finit_module()" into a few helper functions to make it all very clear and legible. In Luis' test-case (bringing up 255 vcpu's in a virtual machine [3]), the "wasted vmalloc" space (ie module data read into a vmalloc'ed area in order to be loaded as a module, but then discarded because somebody else loaded the same module instead) dropped from 1.8GiB to 474kB. Yes, that's gigabytes to kilobytes. It doesn't drop completely to zero, because even with this change, you can still end up having completely serial pointless module loads, where one udev process has loaded a module fully (and thus the kernel has released that exclusive lock on the module file), and then another udev process tries to load the same module again. So while we cannot fully get rid of the fundamental bug in user space, we _can_ get rid of the excessive concurrent thundering herd effect. A couple of final side notes on this all: - This tweak only affects the "finit_module()" system call, which gives the kernel a file descriptor with the module data. You can also just feed the module data as raw data from user space with "init_module()" (note the lack of 'f' at the beginning), and obviously for that case we do _not_ have any "exclusive read" logic. So if you absolutely want to do things wrong in user space, and try to load the same module multiple times, and error out only later when the kernel ends up saying "you can't load the same module name twice", you can still do that. And in fact, some distros will do exactly that, because they will uncompress the kernel module data in user space before feeding it to the kernel (mainly because they haven't started using the new kernel side decompression yet). So this is not some absolute "you can't do concurrent loads of the same module". It's literally just a very simple heuristic that will catch it early in case you try to load the exact same module file at the same time, and in that case avoid a potentially nasty situation. - There is another user of "deny_write_access()": the verity code that enables fs-verity on a file (the FS_IOC_ENABLE_VERITY ioctl). If you use fs-verity and you care about verifying the kernel modules (which does make sense), you should do it *before* loading said kernel module. That may sound obvious, but now the implementation basically requires it. Because if you try to do it concurrently, the kernel may refuse to load the module file that is being set up by the fs-verity code. - This all will obviously mean that if you insist on loading the same module in parallel, only one module load will succeed, and the others will return with an error. That was true before too, but what is different is that the -ETXTBSY error can be returned *before* the success case of another process fully loading and instantiating the module. Again, that might sound obvious, and it is indeed the whole point of the whole change: we are much quicker to notice the whole "you're already in the process of loading this module". So it's very much intentional, but it does mean that if you just spray the kernel with "finit_module()", and expect that the module is immediately loaded afterwards without checking the return value, you are doing something horribly horribly wrong. I'd like to say that that would never happen, but the whole _reason_ for this commit is that udev is currently doing something horribly horribly wrong, so ... Link: https://lore.kernel.org/all/ZEGopJ8VAYnE7LQ2@bombadil.infradead.org/ [1] Link: https://lore.kernel.org/all/23bd0ce6-ef78-1cd8-1f21-0e706a00424a@suse.com/ [2] Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%2FAAUi5z@bombadil.infradead.org/ [3] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cb0b50b8 |
|
09-May-2023 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
module: Remove preempt_disable() from module reference counting. The preempt_disable() section in module_put() was added in commit e1783a240f491 ("module: Use this_cpu_xx to dynamically allocate counters") while the per-CPU counter were switched to another API. The API requires that during the RMW operation the CPU remained the same. This counting API was later replaced with atomic_t in commit 2f35c41f58a97 ("module: Replace module_ref with atomic_t refcnt") Since this atomic_t replacement there is no need to keep preemption disabled while the reference counter is modified. Remove preempt_disable() from module_put(), __module_get() and try_module_get(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
064f4536 |
|
10-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: avoid allocation if module is already present and ready The finit_module() system call can create unnecessary virtual memory pressure for duplicate modules. This is because load_module() can in the worse case allocate more than twice the size of a module in virtual memory. This saves at least a full size of the module in wasted vmalloc space memory by trying to avoid duplicates as soon as we can validate the module name in the read module structure. This can only be an issue if a system is getting hammered with userspace loading modules. There are two ways to load modules typically on systems, one is the kernel moduile auto-loading (*request_module*() calls in-kernel) and the other is things like udev. The auto-loading is in-kernel, but that pings back to userspace to just call modprobe. We already have a way to restrict the amount of concurrent kernel auto-loads in a given time, however that still allows multiple requests for the same module to go through and force two threads in userspace racing to call modprobe for the same exact module. Even though libkmod which both modprobe and udev does check if a module is already loaded prior calling finit_module() races are still possible and this is clearly evident today when you have multiple CPUs. To avoid memory pressure for such stupid cases put a stop gap for them. The *earliest* we can detect duplicates from the modules side of things is once we have blessed the module name, sadly after the first vmalloc allocation. We can check for the module being present *before* a secondary vmalloc() allocation. There is a linear relationship between wasted virtual memory bytes and the number of CPU counts. The reason is that udev ends up racing to call tons of the same modules for each of the CPUs. We can see the different linear relationships between wasted virtual memory and CPU count during after boot in the following graph: +----------------------------------------------------------------------------+ 14GB |-+ + + + + *+ +-| | **** | | *** | | ** | 12GB |-+ ** +-| | ** | | ** | | ** | | ** | 10GB |-+ ** +-| | ** | | ** | | ** | 8GB |-+ ** +-| waste | ** ### | | ** #### | | ** ####### | 6GB |-+ **** #### +-| | * #### | | * #### | | ***** #### | 4GB |-+ ** #### +-| | ** #### | | ** #### | | ** #### | 2GB |-+ ** ##### +-| | * #### | | * #### Before ******* | | **## + + + + After ####### | +----------------------------------------------------------------------------+ 0 50 100 150 200 250 300 CPUs count On the y-axis we can see gigabytes of wasted virtual memory during boot due to duplicate module requests which just end up failing. Trying to infer the slope this ends up being about ~463 MiB per CPU lost prior to this patch. After this patch we only loose about ~230 MiB per CPU, for a total savings of about ~233 MiB per CPU. This is all *just on bootup*! On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests kmod.sh -t 0008 I see a saving in the *highest* side of memory consumption of up to ~ 84 MiB with the Linux kernel selftests kmod test 0008. With the new stress-ng module test I see a 145 MiB difference in max memory consumption with 100 ops. The stress-ng module ops tests can be pretty pathalogical -- it is not realistic, however it was used to finally successfully reproduce issues which are only reported to happen on system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM system. Running out of virtual memory space is no surprise given the above graph, since at least on x86_64 we're capped at 128 MiB, eventually we'd hit a series of errors and once can use the above graph to guestimate when. This of course will vary depending on the features you have enabled. So for instance, enabling KASAN seems to make this much worse. The results with kmod and stress-ng can be observed and visualized below. The time it takes to run the test is also not affected. The kmod tests 0008: The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib) given the tests peak around that range. cat kmod.plot set term dumb set output fileout set yrange [400000:580000] plot filein with linespoints title "Memory usage (KiB)" Before: root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C root@kmod ~ # sort -n -r log-0008-before.txt | head -1 528732 So ~516.33 MiB After: root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C root@kmod ~ # sort -n -r log-0008-after.txt | head -1 442516 So ~432.14 MiB That's about 84 ~MiB in savings in the worst case. The graphs: root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot root@kmod ~ # gnuplot -e "filein='log-0008-after.txt'; fileout='graph-0008-after.txt'" kmod.plot root@kmod ~ # cat graph-0008-before.txt 580000 +-----------------------------------------------------------------+ | + + + + + + + | 560000 |-+ Memory usage (KiB) ***A***-| | | 540000 |-+ +-| | | | *A *AA*AA*A*AA *A*AA A*A*A *AA*A*AA*A A | 520000 |-+A*A*AA *AA*A *A*AA*A*AA *A*A A *A+-| |*A | 500000 |-+ +-| | | 480000 |-+ +-| | | 460000 |-+ +-| | | | | 440000 |-+ +-| | | 420000 |-+ +-| | + + + + + + + | 400000 +-----------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 root@kmod ~ # cat graph-0008-after.txt 580000 +-----------------------------------------------------------------+ | + + + + + + + | 560000 |-+ Memory usage (KiB) ***A***-| | | 540000 |-+ +-| | | | | 520000 |-+ +-| | | 500000 |-+ +-| | | 480000 |-+ +-| | | 460000 |-+ +-| | | | *A *A*A | 440000 |-+A*A*AA*A A A*A*AA A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-| |*A *A*AA*A | 420000 |-+ +-| | + + + + + + + | 400000 +-----------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 The stress-ng module tests: This is used to run the test to try to reproduce the vmap issues reported by David: echo 0 > /proc/sys/vm/oom_dump_tasks ./stress-ng --module 100 --module-name xfs Prior to this commit: root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > baseline-stress-ng.txt root@kmod ~ # sort -n -r baseline-stress-ng.txt | head -1 5046456 After this commit: root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > after-stress-ng.txt root@kmod ~ # sort -n -r after-stress-ng.txt | head -1 4896972 5046456 - 4896972 149484 149484/1024 145.98046875000000000000 So this commit using stress-ng reveals saving about 145 MiB in memory using 100 ops from stress-ng which reproduced the vmap issue reported. cat kmod.plot set term dumb set output fileout set yrange [4700000:5070000] plot filein with linespoints title "Memory usage (KiB)" root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'" kmod-simple-stress-ng.plot root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'" kmod-simple-stress-ng.plot root@kmod ~ # cat graph-stress-ng-before.txt +---------------------------------------------------------------+ 5.05e+06 |-+ + A + + + + + + +-| | * Memory usage (KiB) ***A*** | | * A | 5e+06 |-+ ** ** +-| | ** * * A | 4.95e+06 |-+ * * A * A* +-| | * * A A * * * * A | | * * * * * * *A * * * A * | 4.9e+06 |-+ * * * A*A * A*AA*A A *A **A **A*A *+-| | A A*A A * A * * A A * A * ** | | * ** ** * * * * * * * | 4.85e+06 |-+ A A A ** * * ** *-| | * * * * ** * | | * A * * * * | 4.8e+06 |-+ * * * A A-| | * * * | 4.75e+06 |-+ * * * +-| | * ** | | * + + + + + + ** + | 4.7e+06 +---------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 root@kmod ~ # cat graph-stress-ng-after.txt +---------------------------------------------------------------+ 5.05e+06 |-+ + + + + + + + +-| | Memory usage (KiB) ***A*** | | | 5e+06 |-+ +-| | | 4.95e+06 |-+ +-| | | | | 4.9e+06 |-+ *AA +-| | A*AA*A*A A A*AA*AA*A*AA*A A A A*A *AA*A*A A A*AA*AA | | * * ** * * * ** * *** * | 4.85e+06 |-+* *** * * * * *** A * * +-| | * A * * ** * * A * * | | * * * * ** * * | 4.8e+06 |-+* * * A * * * +-| | * * * A * * | 4.75e+06 |-* * * * * +-| | * * * * * | | * + * *+ + + + + * *+ | 4.7e+06 +---------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 [0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
df3e764d |
|
28-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: add debug stats to help identify memory pressure Loading modules with finit_module() can end up using vmalloc(), vmap() and vmalloc() again, for a total of up to 3 separate allocations in the worst case for a single module. We always kernel_read*() the module, that's a vmalloc(). Then vmap() is used for the module decompression, and if so the last read buffer is freed as we use the now decompressed module buffer to stuff data into our copy module. The last allocation is specific to each architectures but pretty much that's generally a series of vmalloc() calls or a variation of vmalloc to handle ELF sections with special permissions. Evaluation with new stress-ng module support [1] with just 100 ops is proving that you can end up using GiBs of data easily even with all care we have in the kernel and userspace today in trying to not load modules which are already loaded. 100 ops seems to resemble the sort of pressure a system with about 400 CPUs can create on module loading. Although issues relating to duplicate module requests due to each CPU inucurring a new module reuest is silly and some of these are being fixed, we currently lack proper tooling to help diagnose easily what happened, when it happened and who likely is to blame -- userspace or kernel module autoloading. Provide an initial set of stats which use debugfs to let us easily scrape post-boot information about failed loads. This sort of information can be used on production worklaods to try to optimize *avoiding* redundant memory pressure using finit_module(). There's a few examples that can be provided: A 255 vCPU system without the next patch in this series applied: Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s graphical.target reached after 6.988s in userspace And 13.58 GiB of virtual memory space lost due to failed module loading: root@big ~ # cat /sys/kernel/debug/modules/stats Mods ever loaded 67 Mods failed on kread 0 Mods failed on decompress 0 Mods failed on becoming 0 Mods failed on load 1411 Total module size 11464704 Total mod text size 4194304 Failed kread bytes 0 Failed decompress bytes 0 Failed becoming bytes 0 Failed kmod bytes 14588526272 Virtual mem wasted bytes 14588526272 Average mod size 171115 Average mod text size 62602 Average fail load bytes 10339140 Duplicate failed modules: module-name How-many-times Reason kvm_intel 249 Load kvm 249 Load irqbypass 8 Load crct10dif_pclmul 128 Load ghash_clmulni_intel 27 Load sha512_ssse3 50 Load sha512_generic 200 Load aesni_intel 249 Load crypto_simd 41 Load cryptd 131 Load evdev 2 Load serio_raw 1 Load virtio_pci 3 Load nvme 3 Load nvme_core 3 Load virtio_pci_legacy_dev 3 Load virtio_pci_modern_dev 3 Load t10_pi 3 Load virtio 3 Load crc32_pclmul 6 Load crc64_rocksoft 3 Load crc32c_intel 40 Load virtio_ring 3 Load crc64 3 Load The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the next patch in this series applied, shows 226.53 MiB are wasted in virtual memory allocations which due to duplicate module requests during boot. It also shows an average module memory size of 167.10 KiB and an an average module .text + .init.text size of 61.13 KiB. The end shows all modules which were detected as duplicate requests and whether or not they failed early after just the first kernel_read*() call or late after we've already allocated the private space for the module in layout_and_allocate(). A system with module decompression would reveal more wasted virtual memory space. We should put effort now into identifying the source of these duplicate module requests and trimming these down as much possible. Larger systems will obviously show much more wasted virtual memory allocations. root@kmod ~ # cat /sys/kernel/debug/modules/stats Mods ever loaded 67 Mods failed on kread 0 Mods failed on decompress 0 Mods failed on becoming 83 Mods failed on load 16 Total module size 11464704 Total mod text size 4194304 Failed kread bytes 0 Failed decompress bytes 0 Failed becoming bytes 228959096 Failed kmod bytes 8578080 Virtual mem wasted bytes 237537176 Average mod size 171115 Average mod text size 62602 Avg fail becoming bytes 2758544 Average fail load bytes 536130 Duplicate failed modules: module-name How-many-times Reason kvm_intel 7 Becoming kvm 7 Becoming irqbypass 6 Becoming & Load crct10dif_pclmul 7 Becoming & Load ghash_clmulni_intel 7 Becoming & Load sha512_ssse3 6 Becoming & Load sha512_generic 7 Becoming & Load aesni_intel 7 Becoming crypto_simd 7 Becoming & Load cryptd 3 Becoming & Load evdev 1 Becoming serio_raw 1 Becoming nvme 3 Becoming nvme_core 3 Becoming t10_pi 3 Becoming virtio_pci 3 Becoming crc32_pclmul 6 Becoming & Load crc64_rocksoft 3 Becoming crc32c_intel 3 Becoming virtio_pci_modern_dev 2 Becoming virtio_pci_legacy_dev 1 Becoming crc64 2 Becoming virtio 2 Becoming virtio_ring 2 Becoming [0] https://github.com/ColinIanKing/stress-ng.git [1] echo 0 > /proc/sys/vm/oom_dump_tasks ./stress-ng --module 100 --module-name xfs Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
f71afa6a |
|
10-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: extract patient module check into helper The patient module check inside add_unformed_module() is large enough as we need it. It is a bit hard to read too, so just move it to a helper and do the inverse checks first to help shift the code and make it easier to read. The new helper then is module_patient_check_exists(). To make this work we need to mvoe the finished_loading() up, we do that without making any functional changes to that routine. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
430bb0d1 |
|
04-Apr-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: fix kmemleak annotations for non init ELF sections Commit ac3b43283923 ("module: replace module_layout with module_memory") reworked the way to handle memory allocations to make it clearer. But it lost in translation how we handled kmemleak_ignore() or kmemleak_not_leak() for different ELF sections. Fix this and clarify the comments a bit more. Contrary to the old way of using kmemleak_ignore() for init.* ELF sections we stick now only to kmemleak_not_leak() as per suggestion by Catalin Marinas so to avoid any false positives and simplify the code. Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Jim Cromie <jim.cromie@gmail.com> Acked-by: Song Liu <song@kernel.org> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
33c951f6 |
|
21-Mar-2023 |
Jim Cromie <jim.cromie@gmail.com> |
module: already_uses() - reduce pr_debug output volume already_uses() is unnecessarily chatty. `modprobe i915` yields 491 messages like: [ 64.108744] i915 uses drm! This is a normal situation, and isn't worth all the log entries. NOTE: I've preserved the "does not use %s" messages, which happens less often, but does happen. Its not clear to me what it tells a reader, or what info might improve the pr_debug's utility. [ 6847.584999] main:already_uses:569: amdgpu does not use ttm! [ 6847.585001] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585014] main:already_uses:569: amdgpu does not use drm! [ 6847.585016] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585024] main:already_uses:569: amdgpu does not use drm_display_helper! [ 6847.585025] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585084] main:already_uses:569: amdgpu does not use drm_kms_helper! [ 6847.585086] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585175] main:already_uses:569: amdgpu does not use drm_buddy! [ 6847.585176] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585202] main:already_uses:569: amdgpu does not use i2c_algo_bit! [ 6847.585204] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585249] main:already_uses:569: amdgpu does not use gpu_sched! [ 6847.585250] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585314] main:already_uses:569: amdgpu does not use video! [ 6847.585315] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585409] main:already_uses:569: amdgpu does not use iommu_v2! [ 6847.585410] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585816] main:already_uses:569: amdgpu does not use drm_ttm_helper! [ 6847.585818] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6848.762268] dyndbg: add-module: amdgpu.2533 sites no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
66a2301e |
|
21-Mar-2023 |
Jim Cromie <jim.cromie@gmail.com> |
module: add section-size to move_module pr_debug move_module() pr_debug's "Final section addresses for $modname". Add section addresses to the message, for anyone looking at these. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
b10addf3 |
|
21-Mar-2023 |
Jim Cromie <jim.cromie@gmail.com> |
module: add symbol-name to pr_debug Absolute symbol The pr_debug("Absolute symbol" ..) reports value, (which is usually 0), but not the name, which is more informative. So add it. no functional changes Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
6ed81802 |
|
21-Mar-2023 |
Jim Cromie <jim.cromie@gmail.com> |
module: in layout_sections, move_module: add the modname layout_sections() and move_module() each issue ~50 messages for each module loaded. Add mod-name into their 2 header lines, to help the reader find his module. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
3d40bb90 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: merge remnants of setup_load_info() to elf validation The setup_load_info() was actually had ELF validation checks of its own. To later cache useful variables as an secondary step just means looping again over the ELF sections we just validated. We can simply keep tabs of the key sections of interest as we validate the module ELF section in one swoop, so do that and merge the two routines together. Expand a bit on the documentation / intent / goals. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
1bb49db9 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: move more elf validity checks to elf_validity_check() The symbol and strings section validation currently happen in setup_load_info() but since they are also doing validity checks move this to elf_validity_check(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
c7ee8aeb |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: add stop-grap sanity check on module memcpy() The integrity of the struct module we load is important, and although our ELF validator already checks that the module section must match struct module, add a stop-gap check before we memcpy() the final minted module. This also makes those inspecting the code what the goal is. While at it, clarify the goal behind updating the sh_addr address. The current comment is pretty misleading. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
46752820 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: add sanity check for ELF module section The ELF ".gnu.linkonce.this_module" section is special, it is what we use to construct the struct module __this_module, which THIS_MODULE points to. When userspace loads a module we always deal first with a copy of the userspace buffer, and twiddle with the userspace copy's version of the struct module. Eventually we allocate memory to do a memcpy() of that struct module, under the assumption that the module size is right. But we have no validity checks against the size or the requirements for the section. Add some validity checks for the special module section early and while at it, cache the module section index early, so we don't have to do that later. While at it, just move over the assigment of the info->mod to make the code clearer. The validity checker also adds an explicit size check to ensure the module section size matches the kernel's run time size for sizeof(struct module). This should prevent sloppy loads of modules which are built today *without* actually increasing the size of the struct module. A developer today can for example expand the size of struct module, rebuild a directoroy 'make fs/xfs/' for example and then try to insmode the driver there. That module would in effect have an incorrect size. This new size check would put a stop gap against such mistakes. This also makes the entire goal of ".gnu.linkonce.this_module" pretty clear. Before this patch verification of the goal / intent required some Indian Jones whips, torches and cleaning up big old spider webs. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
419e1a20 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: rename check_module_license_and_versions() to check_export_symbol_versions() This makes the routine easier to understand what the check its checking for. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
72f08b3c |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: converge taint work together Converge on a compromise: so long as we have a module hit our linked list of modules we taint. That is, the module was about to become live. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
c3bbf62e |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: move signature taint to module_augment_kernel_taints() Just move the signature taint into the helper: module_augment_kernel_taints() Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
a12b9451 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: move tainting until after a module hits our linked list It is silly to have taints spread out all over, we can just compromise and add them if the module ever hit our linked list. Our sanity checkers should just prevent crappy drivers / bogus ELF modules / etc and kconfig options should be enough to let you *not* load things you don't want. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
437c1f9c |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: split taint adding with info checking check_modinfo() actually does two things: a) sanity checks, some of which are fatal, and so we prevent the user from completing trying to load a module b) taints the kernel The taints are pretty heavy handed because we're tainting the kernel *before* we ever even get to load the module into the modules linked list. That is, it it can fail for other reasons later as we review the module's structure. But this commit makes no functional changes, it just makes the intent clearer and splits the code up where needed to make that happen. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ed52cabe |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: split taint work out of check_modinfo_livepatch() The work to taint the kernel due to a module should be split up eventually. To aid with this, split up the tainting on check_modinfo_livepatch(). This let's us bring more early checks together which do return a value, and makes changes easier to read later where we stuff all the work to do the taints in one single routine. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ad8d3a36 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: rename set_license() to module_license_taint_check() The set_license() routine would seem to a reader to do some sort of setting, but it does not. It just adds a taint if the license is not set or proprietary. This makes what the code is doing clearer, so much we can remove the comment about it. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
02da2cba |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: move check_modinfo() early to early_mod_check() This moves check_modinfo() to early_mod_check(). This doesn't make any functional changes either, as check_modinfo() was the first call on layout_and_allocate(), so we're just moving it back one routine and at the end. This let's us keep separate the checkers from the allocator. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
85e6f61c |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: move early sanity checks into a helper Move early sanity checkers for the module into a helper. This let's us make it clear when we are working with the local copy of the module prior to allocation. This produces no functional changes, it just makes subsequent changes easier to read. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
1e684172 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: add a for_each_modinfo_entry() Add a for_each_modinfo_entry() to make it easier to read and use. This produces no functional changes but makes this code easiert to read as we are used to with loops in the kernel and trims more lines of code. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
feb5b784 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: rename next_string() to module_next_tag_pair() This makes it clearer what it is doing. While at it, make it available to other code other than main.c. This will be used in the subsequent patch and make the changes easier to read. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
b66973b8 |
|
19-Mar-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: move get_modinfo() helpers all above Instead of forward declaring routines for get_modinfo() just move everything up. This makes no functional changes. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
7deabd67 |
|
03-Mar-2023 |
Jason Baron <jbaron@akamai.com> |
dyndbg: use the module notifier callbacks Bring dynamic debug in line with other subsystems by using the module notifier callbacks. This results in a net decrease in core module code. Additionally, Jim Cromie has a new dynamic debug classmap feature, which requires that jump labels be initialized prior to dynamic debug. Specifically, the new feature toggles a jump label from the existing dynamic_debug_setup() function. However, this does not currently work properly, because jump labels are initialized via the 'module_notify_list' notifier chain, which is invoked after the current call to dynamic_debug_setup(). Thus, this patch ensures that jump labels are initialized prior to dynamic debug by setting the dynamic debug notifier priority to 0, while jump labels have the higher priority of 1. Tested by Jim using his new test case, and I've verfied the correct printing via: # modprobe test_dynamic_debug dyndbg. Link: https://lore.kernel.org/lkml/20230113193016.749791-21-jim.cromie@gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302190427.9iIK2NfJ-lkp@intel.com/ Tested-by: Jim Cromie <jim.cromie@gmail.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> CC: Jim Cromie <jim.cromie@gmail.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
9e07f161 |
|
09-Feb-2023 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
module: Remove the unused function within The function within is defined in the main.c file, but not called elsewhere, so remove this unused function. This routine became no longer used after commit ("module: replace module_layout with module_memory"). kernel/module/main.c:3007:19: warning: unused function 'within'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4035 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> [mcgrof: adjust commit log to explain why this change is needed] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ac3b4328 |
|
06-Feb-2023 |
Song Liu <song@kernel.org> |
module: replace module_layout with module_memory module_layout manages different types of memory (text, data, rodata, etc.) in one allocation, which is problematic for some reasons: 1. It is hard to enable CONFIG_STRICT_MODULE_RWX. 2. It is hard to use huge pages in modules (and not break strict rwx). 3. Many archs uses module_layout for arch-specific data, but it is not obvious how these data are used (are they RO, RX, or RW?) Improve the scenario by replacing 2 (or 3) module_layout per module with up to 7 module_memory per module: MOD_TEXT, MOD_DATA, MOD_RODATA, MOD_RO_AFTER_INIT, MOD_INIT_TEXT, MOD_INIT_DATA, MOD_INIT_RODATA, and allocating them separately. This adds slightly more entries to mod_tree (from up to 3 entries per module, to up to 7 entries per module). However, this at most adds a small constant overhead to __module_address(), which is expected to be fast. Various archs use module_layout for different data. These data are put into different module_memory based on their location in module_layout. IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT; data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc. module_memory simplifies quite some of the module code. For example, ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a different allocator for the data. kernel/module/strict_rwx.c is also much cleaner with module_memory. Signed-off-by: Song Liu <song@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
fbed4fea |
|
01-Nov-2022 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
module: Use kstrtobool() instead of strtobool() strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
0254127a |
|
05-Dec-2022 |
Petr Pavlu <petr.pavlu@suse.com> |
module: Don't wait for GOING modules During a system boot, it can happen that the kernel receives a burst of requests to insert the same module but loading it eventually fails during its init call. For instance, udev can make a request to insert a frequency module for each individual CPU when another frequency module is already loaded which causes the init function of the new module to return an error. Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading"), the kernel waits for modules in MODULE_STATE_GOING state to finish unloading before making another attempt to load the same module. This creates unnecessary work in the described scenario and delays the boot. In the worst case, it can prevent udev from loading drivers for other devices and might cause timeouts of services waiting on them and subsequently a failed boot. This patch attempts a different solution for the problem 6e6de3dee51a was trying to solve. Rather than waiting for the unloading to complete, it returns a different error code (-EBUSY) for modules in the GOING state. This should avoid the error situation that was described in 6e6de3dee51a (user space attempting to load a dependent module because the -EEXIST error code would suggest to user space that the first module had been loaded successfully), while avoiding the delay situation too. This has been tested on linux-next since December 2022 and passes all kmod selftests except test 0009 with module compression enabled but it has been confirmed that this issue has existed and has gone unnoticed since prior to this commit and can also be reproduced without module compression with a simple usleep(5000000) on tools/modprobe.c [0]. These failures are caused by hitting the kernel mod_concurrent_max and can happen either due to a self inflicted kernel module auto-loead DoS somehow or on a system with large CPU count and each CPU count incorrectly triggering many module auto-loads. Both of those issues need to be fixed in-kernel. [0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/ Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") Co-developed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Petr Mladek <pmladek@suse.com> [mcgrof: enhance commit log with testing and kmod test result interpretation ] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
f9231a99 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
module: add module_elf_check_arch for module-specific checks The elf_check_arch() function is also used to test compatibility of usermode binaries. Kernel modules may have more specific requirements, for example powerpc would like to test for ABI version compatibility. Add a weak module_elf_check_arch() that defaults to true, and call it from elf_validity_check(). Signed-off-by: Jessica Yu <jeyu@kernel.org> [np: added changelog, adjust name, rebase] Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221128041539.1742489-2-npiggin@gmail.com
|
#
89a6b591 |
|
24-Sep-2022 |
Chen Zhongjin <chenzhongjin@huawei.com> |
module: Remove unused macros module_addr_min/max Unused macros reported by [-Wunused-macros]. These macros are introduced to record the bound address of modules. Commit 80b8bf436990 ("module: Always have struct mod_tree_root") made "struct mod_tree_root" always present and its members addr_min and addr_max can be directly accessed. Macros module_addr_min and module_addr_min are not used anymore, so remove them. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mcgrof: massaged the commit messsage as suggested by Miroslav] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
66f4006b |
|
04-Sep-2022 |
Jim Cromie <jim.cromie@gmail.com> |
kernel/module: add __dyndbg_classes section Add __dyndbg_classes section, using __dyndbg as a model. Use it: vmlinux.lds.h: KEEP the new section, which also silences orphan section warning on loadable modules. Add (__start_/__stop_)__dyndbg_classes linker symbols for the c externs (below). kernel/module/main.c: - fill new fields in find_module_sections(), using section_objs() - extend callchain prototypes to pass classes, length load_module(): pass new info to dynamic_debug_setup() dynamic_debug_setup(): new params, pass through to ddebug_add_module() dynamic_debug.c: - add externs to the linker symbols. ddebug_add_module(): - It currently builds a debug_table, and *will* find and attach classes. dynamic_debug_init(): - add class fields to the _ddebug_info cursor var: di. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220904214134.408619-16-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
b7b4eebd |
|
04-Sep-2022 |
Jim Cromie <jim.cromie@gmail.com> |
dyndbg: gather __dyndbg[] state into struct _ddebug_info This new struct composes the linker provided (vector,len) section, and provides a place to add other __dyndbg[] state-data later: descs - the vector of descriptors in __dyndbg section. num_descs - length of the data/section. Use it, in several different ways, as follows: In lib/dynamic_debug.c: ddebug_add_module(): Alter params-list, replacing 2 args (array,index) with a struct _ddebug_info * containing them both, with room for expansion. This helps future-proof the function prototype against the looming addition of class-map info into the dyndbg-state, by providing a place to add more member fields later. NB: later add static struct _ddebug_info builtins_state declaration, not needed yet. ddebug_add_module() is called in 2 contexts: In dynamic_debug_init(), declare, init a struct _ddebug_info di auto-var to use as a cursor. Then iterate over the prdbg blocks of the builtin modules, and update the di cursor before calling _add_module for each. Its called from kernel/module/main.c:load_info() for each loaded module: In internal.h, alter struct load_info, replacing the dyndbg array,len fields with an embedded _ddebug_info containing them both; and populate its members in find_module_sections(). The 2 calling contexts differ in that _init deals with contiguous subranges of __dyndbgs[] section, packed together, while loadable modules are added one at a time. So rename ddebug_add_module() into outer/__inner fns, call __inner from _init, and provide the offset into the builtin __dyndbgs[] where the module's prdbgs reside. The cursor provides start, len of the subrange for each. The offset will be used later to pack the results of builtin __dyndbg_sites[] de-duplication, and is 0 and unneeded for loadable modules, Note: kernel/module/main.c includes <dynamic_debug.h> for struct _ddeubg_info. This might be prone to include loops, since its also included by printk.h. Nothing has broken in robot-land on this. cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220904214134.408619-12-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
89245600 |
|
08-Sep-2022 |
Sami Tolvanen <samitolvanen@google.com> |
cfi: Switch to -fsanitize=kcfi Switch from Clang's original forward-edge control-flow integrity implementation to -fsanitize=kcfi, which is better suited for the kernel, as it doesn't require LTO, doesn't use a jump table that requires altering function references, and won't break cross-module function address equality. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-6-samitolvanen@google.com
|
#
9fca7115 |
|
08-Sep-2022 |
Sami Tolvanen <samitolvanen@google.com> |
cfi: Remove CONFIG_CFI_CLANG_SHADOW In preparation to switching to -fsanitize=kcfi, remove support for the CFI module shadow that will no longer be needed. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-4-samitolvanen@google.com
|
#
41a55567 |
|
12-Jul-2022 |
David Gow <davidgow@google.com> |
module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m The new KUnit module handling has KUnit test suites listed in a .kunit_test_suites section of each module. This should be loaded when the module is, but at the moment this only happens if KUnit is built-in. Also load this when KUnit is enabled as a module: it'll not be usable unless KUnit is loaded, but such modules are likely to depend on KUnit anyway, so it's unlikely to ever be loaded needlessly. Fixes: 3d6e44623841 ("kunit: unify module and builtin suite definitions") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
#
6f1dae1d |
|
14-Jul-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Show the last unloaded module's taint flag(s) For diagnostic purposes, this patch, in addition to keeping a record/or track of the last known unloaded module, we now will include the module's taint flag(s) too e.g: " [last unloaded: fpga_mgr_mod(OE)]" Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
dbf0ae65 |
|
14-Jul-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Use strscpy() for last_unloaded_module The use of strlcpy() is considered deprecated [1]. In this particular context, there is no need to remain with strlcpy(). Therefore we transition to strscpy(). [1]: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
17dd25c29 |
|
14-Jul-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Modify module_flags() to accept show_state argument No functional change. With this patch a given module's state information (i.e. 'mod->state') can be omitted from the specified buffer. Please note that this is in preparation to include the last unloaded module's taint flag(s), if available. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
2b9401e9 |
|
04-Jul-2022 |
Yang Yingliang <yangyingliang@huawei.com> |
module: Use vzalloc() instead of vmalloc()/memset(0) Use vzalloc() instead of vmalloc() and memset(0) to simpify the code. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ecc726f1 |
|
13-Jun-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Fix ERRORs reported by checkpatch.pl Checkpatch reports following errors: ERROR: do not use assignment in if condition + if ((colon = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) { ERROR: do not use assignment in if condition + if ((mod = find_module_all(name, colon - name, false)) != NULL) ERROR: do not use assignment in if condition + if ((ret = find_kallsyms_symbol_value(mod, name)) != 0) ERROR: do not initialise globals to 0 +int modules_disabled = 0; Fix them. The following one has to remain, because the condition has to be evaluated multiple times by the macro wait_event_interruptible_timeout(). ERROR: do not use assignment in if condition + if (wait_event_interruptible_timeout(module_wq, Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ae39e9ed |
|
03-Jun-2022 |
Saravana Kannan <saravanak@google.com> |
module: Add support for default value for module async_probe Add a module.async_probe kernel command line option that allows enabling async probing for all modules. When this command line option is used, there might still be some modules for which we want to explicitly force synchronous probing, so extend <modulename>.async_probe to take an optional bool input so that async probing can be disabled for a specific module. Signed-off-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
3d6e4462 |
|
08-Jul-2022 |
Jeremy Kerr <jk@codeconstruct.com.au> |
kunit: unify module and builtin suite definitions Currently, KUnit runs built-in tests and tests loaded from modules differently. For built-in tests, the kunit_test_suite{,s}() macro adds a list of suites in the .kunit_test_suites linker section. However, for kernel modules, a module_init() function is used to run the test suites. This causes problems if tests are included in a module which already defines module_init/exit_module functions, as they'll conflict with the kunit-provided ones. This change removes the kunit-defined module inits, and instead parses the kunit tests from their own section in the module. After module init, we call __kunit_test_suites_init() on the contents of that section, which prepares and runs the suite. This essentially unifies the module- and non-module kunit init formats. Tested-by: Maíra Canal <maira.canal@usp.br> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Daniel Latypov <dlatypov@google.com> Signed-off-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
#
74829ddf |
|
07-Jul-2022 |
David Gow <davidgow@google.com> |
module: panic: Taint the kernel when selftest modules load Taint the kernel with TAINT_TEST whenever a test module loads, by adding a new "TEST" module property, and setting it for all modules in the tools/testing directory. This property can also be set manually, for tests which live outside the tools/testing directory with: MODULE_INFO(test, "Y"); Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
#
f963ef12 |
|
12-Jun-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Fix "warning: variable 'exit' set but not used" When CONFIG_MODULE_UNLOAD is not selected, 'exit' is set but never used. It is not possible to replace the #ifdef CONFIG_MODULE_UNLOAD by IS_ENABLED(CONFIG_MODULE_UNLOAD) because mod->exit doesn't exist when CONFIG_MODULE_UNLOAD is not selected. And because of the rcu_read_lock_sched() section it is not easy to regroup everything in a single #ifdef. Let's regroup partially and add missing #ifdef to completely opt out the use of 'exit' when CONFIG_MODULE_UNLOAD is not selected. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
7390b94a |
|
04-May-2022 |
Masahiro Yamada <masahiroy@kernel.org> |
module: merge check_exported_symbol() into find_exported_symbol_in_section() Now check_exported_symbol() always succeeds. Merge it into find_exported_symbol_in_search() to make the code concise. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
cdd66eb5 |
|
04-May-2022 |
Masahiro Yamada <masahiroy@kernel.org> |
module: do not binary-search in __ksymtab_gpl if fsa->gplok is false Currently, !fsa->gplok && syms->license == GPL_ONLY) is checked after bsearch() succeeds. It is meaningless to do the binary search in the GPL symbol table when fsa->gplok is false because we know find_exported_symbol_in_section() will fail anyway. This check should be done before bsearch(). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
c6eee9df |
|
04-May-2022 |
Masahiro Yamada <masahiroy@kernel.org> |
module: do not pass opaque pointer for symbol search There is no need to use an opaque pointer for check_exported_symbol() or find_exported_symbol_in_section. Pass (struct find_symbol_arg *) explicitly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
8eac910a |
|
27-Apr-2022 |
Lecopzer Chen <lecopzer.chen@mediatek.com> |
module: show disallowed symbol name for inherit_taint() The error log for inherit_taint() doesn't really help to find the symbol which violates GPL rules. For example, if a module has 300 symbol and includes 50 disallowed symbols, the log only shows the content below and we have no idea what symbol is. AAA: module using GPL-only symbols uses symbols from proprietary module BBB. It's hard for user who doesn't really know how the symbol was parsing. This patch add symbol name to tell the offending symbols explicitly. AAA: module using GPL-only symbols uses symbols SSS from proprietary module BBB. Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
391e982b |
|
03-May-2022 |
Alexey Dobriyan <adobriyan@gmail.com> |
module: fix [e_shstrndx].sh_size=0 OOB access It is trivial to craft a module to trigger OOB access in this line: if (info->secstrings[strhdr->sh_size - 1] != '\0') { BUG: unable to handle page fault for address: ffffc90000aa0fff PGD 100000067 P4D 100000067 PUD 100066067 PMD 10436f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 1215 Comm: insmod Not tainted 5.18.0-rc5-00007-g9bf578647087-dirty #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:load_module+0x19b/0x2391 Fixes: ec2a29593c83 ("module: harden ELF info handling") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> [rebased patch onto modules-next] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
99bd9956 |
|
02-May-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Introduce module unload taint tracking Currently, only the initial module that tainted the kernel is recorded e.g. when an out-of-tree module is loaded. The purpose of this patch is to allow the kernel to maintain a record of each unloaded module that taints the kernel. So, in addition to displaying a list of linked modules (see print_modules()) e.g. in the event of a detected bad page, unloaded modules that carried a taint/or taints are displayed too. A tainted module unload count is maintained. The number of tracked modules is not fixed. This feature is disabled by default. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
6fb0538d |
|
02-May-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move module_assert_mutex_or_preempt() to internal.h No functional change. This patch migrates module_assert_mutex_or_preempt() to internal.h. So, the aforementiond function can be used outside of main/or core module code yet will remain restricted for internal use only. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
c14e522bc |
|
02-May-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Make module_flags_taint() accept a module's taints bitmap and usable outside core code No functional change. The purpose of this patch is to modify module_flags_taint() to accept a module's taints bitmap as a parameter and modifies all users accordingly. Furthermore, it is now possible to access a given module's taint flags data outside of non-essential code yet does remain for internal use only. This is in preparation for module unload taint tracking support. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
55ce556d |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Remove module_addr_min and module_addr_max Replace module_addr_min and module_addr_max by mod_tree.addr_min and mod_tree.addr_max Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
01dc0386 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC to allow architectures to request having modules data in vmalloc area instead of module area. This is required on powerpc book3s/32 in order to set data non executable, because it is not possible to set executability on page basis, this is done per 256 Mbytes segments. The module area has exec right, vmalloc area has noexec. This can also be useful on other powerpc/32 in order to maximize the chance of code being close enough to kernel core to avoid branch trampolines. Cc: Jason Wessel <jason.wessel@windriver.com> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Douglas Anderson <dianders@chromium.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mcgrof: rebased in light of kernel/module/kdb.c move] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
6ab9942c |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Introduce data_layout In order to allow separation of data from text, add another layout, called data_layout. For architectures requesting separation of text and data, only text will go in core_layout and data will go in data_layout. For architectures which keep text and data together, make data_layout an alias of core_layout, that way data_layout can be used for all data manipulations, regardless of whether data is in core_layout or data_layout. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
446d5566 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Prepare for handling several RB trees In order to separate text and data, we need to setup two rb trees. Modify functions to give the tree as a parameter. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
80b8bf43 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Always have struct mod_tree_root In order to separate text and data, we need to setup two rb trees. This means that struct mod_tree_root is required even without MODULES_TREE_LOOKUP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
7337f929 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Rename debug_align() as strict_align() debug_align() was added by commit 84e1c6bb38eb ("x86: Add RO/NX protection for loadable kernel modules") At that time the config item was CONFIG_DEBUG_SET_MODULE_RONX. But nowadays it has changed to CONFIG_STRICT_MODULE_RWX and debug_align() is confusing because it has nothing to do with DEBUG. Rename it strict_align() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
ef505058 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Rework layout alignment to avoid BUG_ON()s Perform layout alignment verification up front and WARN_ON() and fail module loading instead of crashing the machine. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
32a08c17 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Move module_enable_x() and frob_text() in strict_rwx.c Move module_enable_x() together with module_enable_nx() and module_enable_ro(). Those three functions are going together, they are all used to set up the correct page flags on the different sections. As module_enable_x() is used independently of CONFIG_STRICT_MODULE_RWX, build strict_rwx.c all the time and use IS_ENABLED(CONFIG_STRICT_MODULE_RWX) when relevant. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
05975793 |
|
23-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Make module_enable_x() independent of CONFIG_ARCH_HAS_STRICT_MODULE_RWX module_enable_x() has nothing to do with CONFIG_ARCH_HAS_STRICT_MODULE_RWX allthough by coincidence architectures who need module_enable_x() are selection CONFIG_ARCH_HAS_STRICT_MODULE_RWX. Enable module_enable_x() for everyone everytime. If an architecture already has module text set executable, it's a no-op. Don't check text_size alignment. When CONFIG_STRICT_MODULE_RWX is set the verification is already done in frob_rodata(). When CONFIG_STRICT_MODULE_RWX is not set it is not a big deal to have the start of data as executable. Just make sure we entirely get the last page when the boundary is not aligned. And don't BUG on misaligned base as some architectures like nios2 use kmalloc() for allocating modules. So just bail out in that case. If that's a problem, a page fault will occur later anyway. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
47889798 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move version support into a separate file No functional change. This patch migrates module version support out of core code into kernel/module/version.c. In addition simple code refactoring to make this possible. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
f64205a4 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move kdb module related code out of main kdb code No functional change. This patch migrates the kdb 'lsmod' command support out of main kdb code into its own file under kernel/module. In addition to the above, a minor style warning i.e. missing a blank line after declarations, was resolved too. The new file was added to MAINTAINERS. Finally we remove linux/module.h as it is entirely redundant. Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
44c09535 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move sysfs support into a separate file No functional change. This patch migrates module sysfs support out of core code into kernel/module/sysfs.c. In addition simple code refactoring to make this possible. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
0ffc40f6 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move procfs support into a separate file No functional change. This patch migrates code that allows one to generate a list of loaded/or linked modules via /proc when procfs support is enabled into kernel/module/procfs.c. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
91fb02f3 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move kallsyms support into a separate file No functional change. This patch migrates kallsyms code out of core module code kernel/module/kallsyms.c Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
473c84d1 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move kmemleak support to a separate file No functional change. This patch migrates kmemleak code out of core module code into kernel/module/debug_kmemleak.c Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
0c1e4280 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move extra signature support out of core code No functional change. This patch migrates additional module signature check code from core module code into kernel/module/signing.c. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
b33465fe |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move strict rwx support to a separate file No functional change. This patch migrates code that makes module text and rodata memory read-only and non-text memory non-executable from core module code into kernel/module/strict_rwx.c. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
58d208de |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move latched RB-tree support to a separate file No functional change. This patch migrates module latched RB-tree support (e.g. see __module_address()) from core module code into kernel/module/tree_lookup.c. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
1be9473e |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move livepatch support to a separate file No functional change. This patch migrates livepatch support (i.e. used during module add/or load and remove/or deletion) from core module code into kernel/module/livepatch.c. At the moment it contains code to persist Elf information about a given livepatch module, only. The new file was added to MAINTAINERS. Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
8ab4ed08 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Simple refactor in preparation for split No functional change. This patch makes it possible to move non-essential code out of core module code. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
cfc1d277 |
|
22-Mar-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Move all into module/ No functional changes. This patch moves all module related code into a separate directory, modifies each file name and creates a new Makefile. Note: this effort is in preparation to refactor core module code. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|