Cross Reference: /freebsd-current/sys/kern/subr

History log of /freebsd-current/sys/kern/subr_param.c
Revision	Date	Author	Comments
# 13a5a46c	29-Apr-2024	Andrew Gallatin <gallatin@FreeBSD.org>	Fix new users of MAXPHYS and hide it from the kernel namespace In cd8537910406, kib made maxphys a load-time tunable. This made the #define MAXPHYS in sys/param.h almost entirely obsolete, as it could now be overridden by kern.maxphys at boot time, or by opt_maxphys.h. However, decades of tradition have led to several new, incorrect, uses of MAXPHYS in other parts of the kernel, mostly by seasoned developers. I've corrected those uses here in a mechanical fashion, and verified that it fixes a bug in the md driver that I was experiencing. Since using MAXPHYS is such an easy mistake to make, it is best to hide it from the kernel namespace. So I've moved its definition to _maxphys.h, which is now included in param.h only for userspace. That brings up the fact that lots of userspace programs use MAXPHYS for different reasons, most of them probably wrong. Userspace consumers that really need to know the value of maxphys should probably be changed to use the kern.maxphys sysctl. But that's outside the scope of this change. Reviewed by: imp, jkim, kib, markj Fixes: 30038a8b4efc ("md: Get rid of the pbuf zone") Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D44986
# b0aaf8be	18-Mar-2024	Mateusz Guzik <mjg@FreeBSD.org>	Rename VM_LAST to more appropriate VM_GUEST_LAST NFC Sponsored by: Rubicon Communications, LLC ("Netgate")
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 733a6684	06-Nov-2023	Mateusz Guzik <mjg@FreeBSD.org>	Fix up the vm_guest_sysctl_names size assert. As VM_LAST was included in the array, the size check had to always pass. While here modernize the assert itself. Sponsored by: Rubicon Communications, LLC ("Netgate")
# 39cddbd7	09-Oct-2023	Konstantin Belousov <kib@FreeBSD.org>	arm64: do not disable the kern.kstack_pages tunable on arm64 Add a comment explaining what is not quite correct with arm64 and riscv. Reviewed by: jhb, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42143
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 9d6ae1e3	04-Jun-2023	Colin Percival <cperciva@FreeBSD.org>	Revert "Revert "tslog: Annotate some early boot functions"" Now that <sys/tslog.h> is wrapped in #ifdef _KERNEL, it's safe to have tslog annotations in files which might be built from userland (i.e. in subr_boot.c, which is built as part of the boot loader). This reverts commit 59588a546f55523d6fd37ab42eb08b719311d7d6.
# 59588a54	04-Jun-2023	Colin Percival <cperciva@FreeBSD.org>	Revert "tslog: Annotate some early boot functions" The change to subr_boot.c broke the libsa build because the TSLOG macros have their own definitions for the boot loader -- I didn't realize that the loader code used subr_boot.c. I'm currently testing a fix and I'll revert this revert once I'm satisfied that everything works, but I don't want to leave the tree broken for too long. This reverts commit 469cfa3c30ee7a5ddeb597d0a8c3e7cac909b27a.
# 469cfa3c	22-May-2023	Colin Percival <cperciva@FreeBSD.org>	tslog: Annotate some early boot functions Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM, hammer_time takes roughly 2740 us: * 55 us in xen_pvh_parse_preload_data * 20 us in boot_parse_cmdline_delim * 20 us in boot_env_to_howto * 15 us in identify_hypervisor * 1320 us in link_elf_reloc * 1310 us in relocate_file1 handling ef->rela * 25 us in init_param1 * 30 us in dpcpu_init * 355 us in initializecpu * 255 us in initializecpu calling load_cr4 * 425 us in getmemsize * 280 us in pmap_bootstrap * 205 us in create_pagetables * 10 us in init_param2 * 25 us in pci_early_quirks * 60 us in cninit * 90 us in kdb_init * 105 us in msgbufinit * 20 us in fpuinit * 205 us elsewhere in hammer_time Some of these are unavoidable (e.g. identify_hypervisor uses CPUID and load_cr4 loads the CR4 register, both of which trap to the hypervisor) but others may deserve attention. Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D40325
# a3571129	01-Mar-2023	Mateusz Guzik <mjg@FreeBSD.org>	kern: whack __mips__ leftover Sponsored by: Rubicon Communications, LLC ("Netgate")
# 35a33d14	20-Oct-2022	Hans Petter Selasky <hselasky@FreeBSD.org>	time(3): Optimize tvtohz() function. List of changes: - Use integer multiplication instead of long multiplication, because the result is an integer. - Remove multiple if-statements and predict new if-statements. - Rename local variable name, "ticks" into "retval" to avoid shadowing the system "ticks" global variable. Reviewed by: kib@ and imp@ MFC after: 1 week Sponsored by: NVIDIA Networking Differential Revision: https://reviews.freebsd.org/D36859
# ee29897f	03-Oct-2022	Hans Petter Selasky <hselasky@FreeBSD.org>	time(3): Declare the minimum and maximum hz values supported. Reviewed by: kib@ and imp@ MFC after: 1 week Sponsored by: NVIDIA Networking Differential Revision: https://reviews.freebsd.org/D37072
# 2b7d656f	03-Sep-2022	Gordon Bergling <gbe@FreeBSD.org>	kern: Fix a typo in asource code comment - s/overriden/overridden/ MFC after: 3 days
# 0a994229	16-Jun-2021	Warner Losh <imp@FreeBSD.org>	Move mips and arm to 1000Hz by default. armv6 and armv7 systems already were 1000Hz. The other armv5 were a mix of 100 and 1000. This changes them to 1000. Should there be issues, we can add options HZ=100 to the systems that have bad performance at the drop of a hat. mips is a lot more complicated. But most of the systems are already 1000HZ. The hardware exceptions are all fast enough to run at 1000Hz. MALTA is our primary emulator, and history has shown emulators tend to like 100Hz better, so run those systems at 100Hz. As with arm, any system that shows a huge performance regression can reverted to 100Hz easily. This was going to be committed well in advance of the 13 branch, but it was delayed and forgotten til now. Discussed on: #bsdmips ages ago Sponsored by: Netflix
# cd853791	27-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225
# 3b1f974b	26-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Make max ticks for pause in vn_lock_pair() adjustable at runtime. Reduce default value from hz / 10 to hz / 100. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# 58aa35d4	03-Feb-2020	Warner Losh <imp@FreeBSD.org>	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs
# bdc786cc	06-Sep-2019	Philip Paeps <philip@FreeBSD.org>	riscv: restore default HZ=1000, keep QEMU at HZ=100 This reverts r351918 and r351919. Discussed with: br, ian, imp
# 7f0851ab	05-Sep-2019	Philip Paeps <philip@FreeBSD.org>	riscv: default to HZ=100 Most current RISC-V development platforms are not fast enough to benefit from the increased granularity provided by HZ=1000. Sponsored by: Axiado
# 1177d38c	21-May-2019	Stephen J. Kiernan <stevek@FreeBSD.org>	The older detection methods (smbios.bios.vendor and smbios.system.product) are able to determine some virtual machines, but the vm_guest variable was still only being set to VM_GUEST_VM. Since we do know what some of them specifically are, we can set vm_guest appropriately. Also, if we see the CPUID has the HV flag, but we were unable to find a definitive vendor in the Hypervisor CPUID Information Leaf, fall back to the older detection methods, as they may be able to determine a specific HV type. Add VM_GUEST_PARALLELS value to VM_GUEST for Parallels. Approved by: cem Differential Revision: https://reviews.freebsd.org/D20305
# 949f834a	17-May-2019	Stephen J. Kiernan <stevek@FreeBSD.org>	Instead of individual conditional statements to look for each hypervisor type, use a table to make it easier to add more in the future, if needed. Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST enumeration to indicate VirtualBox. Save the CPUID base for the hypervisor entry that we detected. Driver code may need to know about it in order to obtain additional CPUID features. Approved by: bryanv, jhb Differential Revision: https://reviews.freebsd.org/D16305
# d1bb5d7d	16-Jan-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Fix mistake in r343030: move nswbuf calculation back to kern_vfs_bio_buffer_alloc(), because in init_param2() nbuf isn't really initialized yet. Pointed out by: bde
# 756a5412	14-Jan-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# e0a6a23c	07-Jun-2017	Marcelo Araujo <araujo@FreeBSD.org>	Allow sysctl kern.vm_guest to return bhyve when running under bhyve. Submitted by: Sean Fagan <sef@ixsystems.com> Reviewed by: grehan MFH: 4 weeks. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11090
# 5d8cce17	21-Nov-2016	John Baldwin <jhb@FreeBSD.org>	Initialize 'ticks' earlier in boot after 'hz' is set. This avoids the time-warp after kthreads have started running and the required fixup to td_slptick and td_blktick in the EARLY_AP_STARTUP case. Now, 'ticks' is initialized before any kthreads are created or any context switches are performed. Tested by: gavin MFC after: 2 weeks Sponsored by: Netflix
# 69a28758	15-Sep-2016	Ed Maste <emaste@FreeBSD.org>	Renumber license clauses in sys/kern to avoid skipping #3
# fdb6320d	13-Jul-2016	Eric Badger <badger@FreeBSD.org>	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks
# 1f57d8c6	21-Sep-2015	Konstantin Belousov <kib@FreeBSD.org>	Ensure that maxproc does not exceed pid_max, at the time of boot. Changes to kern.pid_max mib after the boot can break this relation. The maxfiles value was calculated by the MAXFILES formula based on maxproc value, but this change decouples them, and MAXFILES now references maxusers. Without manual tuning, the maxfiles default value remains as it was prior to this commit. But for systems which have tuned maxproc and rely on maxfiles to adjust, additional reconfiguration is needed. Reported by: rwatson Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 32766cd2	09-Sep-2015	Adrian Chadd <adrian@FreeBSD.org>	Also make kern.maxfilesperproc a boot time tunable. Auto-tuning threshold discussions aside, it turns out that if you want to lower this on say, rather memory-packed machines, you either set maxusers or kern.maxfiles, or you set it in sysctl. The former is a non-exact way to tune this; the latter doesn't actually affect anything in the startup scripts. This first occured because I wondered why the hell screen would take upwards of 10 seconds to spawn a new screen. I then found python doing the same thing during fork/exec of child processes - it calls close() on each FD up to the current openfiles limit. On a 1TB machine this is like, 26 million FDs per process. Ugh. So: * This allows it to be set early in /boot/loader.conf; * It can be used to work around the ridiculous situation of screen, python, etc doing a close() on potentially millions of FDs even though you only have four open. Tested: * 4GB, 32GB, 64GB, 128GB, 384GB, 1TB systems with autotune, ensuring screen and python forking doesn't result in some pretty hilariously bad behaviour. TODO: * Note that the default login.conf sets openfiles-cur to unlimited, effectively obeying kern.maxfilesperproc. Perhaps we should fix this. * .. and even if we do, we need to also ensure that daemons get a soft limit of something reasonable and capped - they can request more FDs themselves. MFC after: 1 week Sponsored by: Norse Corp, Inc.
# edc82223	10-Aug-2015	Konstantin Belousov <kib@FreeBSD.org>	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week
# 98082691	28-Jul-2015	Jeff Roberson <jeff@FreeBSD.org>	- Make 'struct buf *buf' private to vfs_bio.c. Having a global variable 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division
# ed95805e	30-Apr-2015	John Baldwin <jhb@FreeBSD.org>	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes
# acfc962f	14-Mar-2015	Ian Lepore <ian@FreeBSD.org>	Use SYSCTL_OUT_STR() to return strings. PR: 195668
# 01e1933d	28-Oct-2014	John Baldwin <jhb@FreeBSD.org>	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel
# 2be111bf	16-Oct-2014	Davide Italiano <davide@FreeBSD.org>	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# 903093ec	12-Nov-2013	Sergey Kandaurov <pluknet@FreeBSD.org>	Add VM_LAST, a special last element in enum VM_GUEST and use it in CTASSERT to ensure that vm_guest range is covered by vm_guest_sysctl_names. Suggested by: mjg
# 40b4cb0f	11-Nov-2013	Sergey Kandaurov <pluknet@FreeBSD.org>	Set description string for VM_GUEST_HV (HyperV guest). This fixes fallout from r256425. Reported by: Pavel Timofeev <timp87@gmail com> Tested by: Pavel Timofeev <timp87@gmail com> Reviewed by: Roger Pau Monnц╘ MFC after: 3 days
# 46038d7f	27-Oct-2013	Konstantin Belousov <kib@FreeBSD.org>	Fix typo. MFC after: 3 days
# ee75e7de	19-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
# f8ccf82a	08-Mar-2013	Andre Oppermann <andre@FreeBSD.org>	Move the auto-sizing of the callout array from init_param2() to kern_timeout_callwheel_alloc() where it is actually used. This is a mechanical move and no tuning parameters are changed. The pre-allocated callout array is only used for legacy timeout(9) calls and is only allocated and active on cpu0. Eventually all remaining users of timeout(9) should switch to the callout_* API. Reviewed by: davide
# 5b999a6b	04-Mar-2013	Davide Italiano <davide@FreeBSD.org>	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil
# 37140716	17-Jan-2013	Andre Oppermann <andre@FreeBSD.org>	Move the mbuf memory limit calculations from init_param2() to tunable_mbinit() where it is next to where it is used later. Change the sysinit level of tunable_mbinit() from SI_SUB_TUNABLES to SI_SUB_KMEM after the VM is running. This allows to use better methods to determine the effectively available physical and virtual memory available to the kernel. Update comments. In a second step it can be merged into mbuf_init().
# 17ebe960	15-Jan-2013	Alfred Perlstein <alfred@FreeBSD.org>	Do not autotune ncallout to be greater than 18508. When maxusers was unrestricted and maxfiles was allowed to autotune much higher the result was that ncallout which was based on maxfiles and maxproc grew much higher than was needed. To fix this clip autotuning to the same number we would get with the old maxusers algorithm which would stop scaling at 384 maxusers. Growing ncalout higher is not likely to be needed since most consumers of timeout(9) are gone and any higher value for ncallout causes the callwheel hashes to be much larger than will even be needed for most applications. MFC after: 1 month Reviewed by: mav
# 0165f660	15-Jan-2013	Andrey Zonov <zont@FreeBSD.org>	- Detect when we are in KVM. Silence on: emulation Approved by: kib (mentor) MFC after: 1 week
# a09580e7	05-Jan-2013	Neel Natu <neel@FreeBSD.org>	Teach the kernel to recognize that it is executing inside a bhyve virtual machine. Obtained from: NetApp
# 0060bab5	09-Dec-2012	Andre Oppermann <andre@FreeBSD.org>	Prevent long type overflow of realmem calculation on ILP32 by forcing calculation to be in quad_t space. Fix style issue with second parameter to qmin(). Reported by: alc Reviewed by: bde, alc
# df905a2b	29-Nov-2012	Andre Oppermann <andre@FreeBSD.org>	Using a long is the wrong type to represent the realmem and maxmbufmem variable as they may overflow on i386/PAE and i386 with > 2GB RAM. Use 64bit quad_t instead. It has broader kernel infrastructure support with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types. Pointed out by: alc, bde
# ead46972	27-Nov-2012	Andre Oppermann <andre@FreeBSD.org>	Base the mbuf related limits on the available physical memory or kernel memory, whichever is lower. The overall mbuf related memory limit must be set so that mbufs (and clusters of various sizes) can't exhaust physical RAM or KVM. The limit is set to half of the physical RAM or KVM (whichever is lower) as the baseline. In any normal scenario we want to leave at least half of the physmem/kvm for other kernel functions and userspace to prevent it from swapping too easily. Via a tunable kern.maxmbufmem the limit can be upped to at most 3/4 of physmem/kvm. At the same time divorce maxfiles from maxusers and set maxfiles to physpages / 8 with a floor based on maxusers. This way busy servers can make use of the significantly increased mbuf limits with a much larger number of open sockets. Tidy up ordering in init_param2() and check up on some users of those values calculated here. Out of the overall mbuf memory limit 2K clusters and 4K (page size) clusters to get 1/4 each because these are the most heavily used mbuf sizes. 2K clusters are used for MTU 1500 ethernet inbound packets. 4K clusters are used whenever possible for sends on sockets and thus outbound packets. The larger cluster sizes of 9K and 16K are limited to 1/6 of the overall mbuf memory limit. When jumbo MTU's are used these large clusters will end up only on the inbound path. They are not used on outbound, there it's still 4K. Yes, that will stay that way because otherwise we run into lots of complications in the stack. And it really isn't a problem, so don't make a scene. Normal mbufs (256B) weren't limited at all previously. This was problematic as there are certain places in the kernel that on allocation failure of clusters try to piece together their packet from smaller mbufs. The mbuf limit is the number of all other mbuf sizes together plus some more to allow for standalone mbufs (ACK for example) and to send off a copy of a cluster. Unfortunately there isn't a way to set an overall limit for all mbuf memory together as UMA doesn't support such a limiting. NB: Every cluster also has an mbuf associated with it. Two examples on the revised mbuf sizing limits: 1GB KVM: 512MB limit for mbufs 419,430 mbufs 65,536 2K mbuf clusters 32,768 4K mbuf clusters 9,709 9K mbuf clusters 5,461 16K mbuf clusters 16GB RAM: 8GB limit for mbufs 33,554,432 mbufs 1,048,576 2K mbuf clusters 524,288 4K mbuf clusters 155,344 9K mbuf clusters 87,381 16K mbuf clusters These defaults should be sufficient for even the most demanding network loads. MFC after: 1 month
# 79f62ed6	09-Nov-2012	Alfred Perlstein <alfred@FreeBSD.org>	Allow maxusers to scale on machines with large address space. Some hooks are added to clamp down maxusers and nmbclusters for small address space systems. VM_MAX_AUTOTUNE_MAXUSERS - the max maxusers that will be autotuned based on physical memory. VM_MAX_AUTOTUNE_NMBCLUSTERS - max nmbclusters based on physical memory. These are set to the old values on i386 to preserve the clamping that was being done to all arches. Another macro VM_AUTOTUNE_NMBCLUSTERS is provided to allow an override for the calculation on a MD basis. Currently no arch defines this. Reviewed by: peter MFC after: 2 weeks
# 9b80cdf5	29-Oct-2012	Neel Natu <neel@FreeBSD.org>	Teach FreeBSD to detect that it is a guest running inside BHyVe. Reviewed by: grehan Obtained from: NetApp
# 7b6d92c0	24-Oct-2012	Alfred Perlstein <alfred@FreeBSD.org>	Allow autotune maxusers > 384 on 64 bit machines A default install on large memory machines with multiple 10gigE interfaces were not being given enough mbufs to do full bandwidth TCP or NFS traffic. To keep the value somewhat reasonable, we scale back the number of maxuers by 1/6 past the 384 point. This gives us enough mbufs for most of our pretty basic 10gigE line-speed tests to complete.
# ceb0f715	03-Sep-2012	Andrey Zonov <zont@FreeBSD.org>	- Mark some sysctls with CTLFLAG_TUN flag instead of CTLFLAG_RDTUN. Pointed out by: avg Approved by: kib (mentor) MFC after: 1 week
# c3927cd9	02-Sep-2012	Andrey Zonov <zont@FreeBSD.org>	- Make kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, kern.dflssiz, kern.maxssiz and kern.sgrowsiz sysctls writable. Approved by: kib (mentor)
# 3fa615bc	16-Aug-2012	Konstantin Belousov <kib@FreeBSD.org>	As a safety measure, disable lowering pid_max too much. Requested by: Peter Jeremy <peter@rulingia.com> MFC after: 1 week
# 02c6fc21	15-Aug-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a sysctl kern.pid_max, which limits the maximum pid the system is allowed to allocate, and corresponding tunable with the same name. Note that existing processes with higher pids are left intact. MFC after: 1 week
# e9a3f785	23-Mar-2011	Alan Cox <alc@FreeBSD.org>	Modestly increase the maximum allowed size of the kmem map on i386. Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used.
# 4053b05b	21-Jan-2011	Sergey Kandaurov <pluknet@FreeBSD.org>	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# ea235a14	06-Aug-2010	Christian S.J. Peron <csjp@FreeBSD.org>	Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes the virtualization detection successfully disabling the clflush instruction. This fixes insta-panics for XEN hvm users when the hw.clflush_disable tunable is -1 or 0 (-1 by default). Discussed with: jhb
# bcebf6a1	23-Jun-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Reverse the logic of the if statement that sets the default value of HZ; the list of 1000 Hz platforms was getting unwieldy. Suggested by: marcel
# e864acd4	23-Jun-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Move default HZ from 100 to 1000 on powerpc. Reviewed by: marcel MFC after: 2 weeks
# 90cb9e50	06-Mar-2010	Ivan Voras <ivoras@FreeBSD.org>	MFC r204611, r204633: Comment and better sysctl documentation string for VM guest detection variable and sysctl.
# 07ce8d9b	02-Mar-2010	Ivan Voras <ivoras@FreeBSD.org>	Document the VM detection type and sysctl a bit better.
# c288186f	02-Mar-2010	Alan Cox <alc@FreeBSD.org>	MFC r204420 When running as a guest operating system, the FreeBSD kernel must assume that the virtual machine monitor has enabled machine check exceptions. Unfortunately, on AMD Family 10h processors the machine check hardware has a bug (Erratum 383) that can result in a false machine check exception when a superpage promotion occurs. Thus, I am disabling superpage promotion when the FreeBSD kernel is running as a guest operating system on an AMD Family 10h processor.
# 0b993ee5	27-Feb-2010	Alan Cox <alc@FreeBSD.org>	When running as a guest operating system, the FreeBSD kernel must assume that the virtual machine monitor has enabled machine check exceptions. Unfortunately, on AMD Family 10h processors the machine check hardware has a bug (Erratum 383) that can result in a false machine check exception when a superpage promotion occurs. Thus, I am disabling superpage promotion when the FreeBSD kernel is running as a guest operating system on an AMD Family 10h processor. Reviewed by: jhb, kib MFC after: 3 days
# 3c48c089	24-Feb-2010	Brooks Davis <brooks@FreeBSD.org>	MFC r202143,202163,202341,202342,204278 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to somewhere in the neighborhood of INT_MAX/4 one a system with sufficent RAM and memory bandwidth. Given that the Windows group limit is 1024, this range should be sufficient for most applications r202342: Only allocate the space we need before calling kern_getgroups instead of allocating what ever the user asks for up to "ngroups_max + 1". On systems with large values of kern.ngroups this will be more efficient. The now redundant check that the array is large enough in kern_getgroups() is deliberate to allow this change to be merged to stable/8 without breaking potential third party consumers of the API.
# 2e5514de	24-Feb-2010	Brooks Davis <brooks@FreeBSD.org>	Don't inforce an upper bound on kern.ngroups. The INT_MAX-1 limit was too high due to several overflows. The actual limit is somewhere in the neighborhood of INT_MAX/4 on 64-bit machines, but most systems could not support such a limit due to a lack of memory and the cost of duplicate credentials. Reported by: bde
# 412f9500	12-Jan-2010	Brooks Davis <brooks@FreeBSD.org>	Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to INT_MAX-1. Given that the Windows group limit is 1024, this range should be sufficient for most applications. MFC after: 1 month
# 347e22a6	07-Jul-2009	Mike Silbersack <silby@FreeBSD.org>	Increase HZ_VM from 10 to 100. While 10 hz saves cpu time under VM environments, it's too slow for FreeBSD to work properly. For example, ping at 10hz pings about every 600ms instead of about every second. Approved by: re (kib)
# 9b84ba1c	23-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Improve the description of a few sysctls. Submitted by: bde (partially) MFC after: 3 days
# 42dd14ba	12-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Change the sysctls for maxbcache and maxswzone from int to long. I missed this earlier since these sysctls don't exist in 7.x yet.
# b9f2a7da	12-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Export the current values of nbuf, ncallout, and nswbuf via read-only sysctls that match the tunable names. MFC after: 3 days
# 64ecd139	10-Mar-2009	John Baldwin <jhb@FreeBSD.org>	- Make maxpipekva a signed long rather than an unsigned long as overflow is more likely to be noticed with signed types. - Make amountpipekva a long as well to match maxpipekva. Discussed with: bde
# 5bd65606	09-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
# 98a44311	30-Dec-2008	Ivan Voras <ivoras@FreeBSD.org>	Document the relationship between enum VM_GUEST and the vm_guest_sysctl_names array. Approved by: gnn (original version)
# 34820bbf	27-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Hide detect_virtual() along with the accompanying string arrays under #ifndef XEN to make XEN config compile again. In case of Xen vm_guest is hard coded. Move the list for the vm_guest sysctl out of the restictive bounds as the sysctl is there in either case.
# 3610a226	18-Dec-2008	Ivan Voras <ivoras@FreeBSD.org>	By popular request, stringify kern.vm_guest sysctl. Now it returns a short, self-documenting string describing the detected virtual environment. Approved by: gnn (mentor) (earlier version)
# 3dc30911	17-Dec-2008	Ivan Voras <ivoras@FreeBSD.org>	Introduce a sysctl kern.vm_guest that reflects what the kernel knows about it running under a virtual environment. This also introduces a globally accessible variable vm_guest that can be used where appropriate in the kernel to inspect this environment. To make it easier for the long run, an enum VM_GUEST is also introduced, which could possibly be factored out in a header somewhere (but the question is where - vm/vm_param.h? sys/param.h?) so it eventually becomes a part of the standard KPI. In any case, it's a start. The purpose of all this isn't to absolutely detect that the OS is running under a virtual environment (cf. "redpill") but to allow the parts of the kernel and the userland that care about this particular aspect and can do something useful depending on it to have a standardised interface. Reducing kern.hz is one example but there are other things that could be done like avoiding context switches, not using CPU instructions that are known to be slow in emulation, possibly different strategies in VM (memory) allocation, CPU scheduling, etc. It isn't clear if the JAILS/VIMAGE functionality should also be exposed by this particular mechanism (probably not since they're not "full" virtual hardware environments). Sometime in the future another sysctl and a variable could be introduced to reflect if the kernel supports any kind of virtual hosting (e.g. VMWare VMI, Xen dom0). Reviewed by: silence from src-commiters@, virtualization@, kmacy@ Approved by: gnn (mentor) Security: Obscurity doesn't help.
# 9bd2cbe4	08-Dec-2008	Jung-uk Kim <jkim@FreeBSD.org>	- Detect Bochs BIOS variants and use HZ_VM as well. - Free kernel environment variable after its use. - Fix style(9) nits.
# 4da059f3	27-Oct-2008	Maxim Sobolev <sobomax@FreeBSD.org>	vm_pnames should be "const char *const[]". Submitted by: Christoph Mallon
# fa38c351	27-Oct-2008	Maxim Sobolev <sobomax@FreeBSD.org>	vm_pnames has no reason to be global. MFC after: 2 weeks
# 7f03c419	27-Oct-2008	Maxim Sobolev <sobomax@FreeBSD.org>	Default HZ value (1,000) on i386/amd64 is not very virtual machine friendly. Due to the nature of the beast it causes lot of unproductive overhead. This is especially bad when running SMP kernel on VMWare with several virtual processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my dual-core MacBook Pro when I am enabling two virtual CPUs, making even host not very usable. Detect when we are running in the sandbox and reduce HZ to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This brings host CPU usage of idle FreeBSD/SMP on two virtual processors down to 10%. Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox and VirtualPC. MFC after: 2 weeks
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 6819e13e	04-Jul-2008	Alan Cox <alc@FreeBSD.org>	Correct an error in the comments for init_param3(). Discussed with: silby
# b109dd74	09-May-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	- Export HZ value via kern.hz sysctl (this is the same name as for the loader tunable). - Document other sysctls in this file and also mark them as loader tunable via CTLFLAG_RDTUN flag. Reviewed by: roberto
# 7c45a9c4	16-Oct-2007	Alfred Perlstein <alfred@FreeBSD.org>	Export maxswzone, maxbcache, maxtsiz, dfldsiz, maxdsiz, dflssiz, maxssiz, and sgrowsiz via sysctl. MFC after: 1 week
# f098dcde	14-Oct-2005	Kris Kennaway <kris@FreeBSD.org>	Partially revert revision 1.66, which contained a change that did not correspond to the commit log. It changed the maxswzone and maxbcache parameters from int to long, without changing the extern definitions in <sys/buf.h>. In fact it's a good thing it did not, because other parts of the system are not yet ready for this, and on large-memory sparc machines it causes severe filesystem damage if you try. The worst effect of the change was that the tunables controlling the above variables stopped working. These were necessary to allow such large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not set a hard-coded upper limit on these parameters and they ended up overflowing an int, causing an infinite loop at boot in bufinit(). Reviewed by: mlaier
# ea35b592	16-Apr-2005	Marius Strobl <marius@FreeBSD.org>	Increase default HZ for sparc64 to 1000.
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 4ac33532	29-Nov-2004	Bruce M Simpson <bms@FreeBSD.org>	Fix the build.
# 1114f4f9	29-Nov-2004	Peter Wemm <peter@FreeBSD.org>	Switch from 1024hz to 1000hz on amd64 to match i386. 1024 is a bad choice because it is so in sync with stathz (128hz or 4096hz etc).
# 7419d1e2	08-Nov-2004	Dag-Erling Smørgrav <des@FreeBSD.org>	#include <vm/vm_param.h> instead of <machine/vmparam.h> (the former includes the latter, but also declares variables which are defined in kern/subr_param.c). Change som VM parameters from quad_t to unsigned long. They refer to quantities (size limits for text, heap and stack segments) which must necessarily be smaller than the size of the address space, so long is adequate on all platforms. MFC after: 1 week
# 02a20471	07-Nov-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Increase default HZ for ia64 to 1000.
# 0c7d0f96	06-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Increase default HZ for i386 to 1000
# 7f8a436f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 1dc10fce	30-Mar-2004	Alan Cox <alc@FreeBSD.org>	White space and wording changes to init_param3(). Mostly submitted by: bde
# e3b19536	27-Mar-2004	Alan Cox <alc@FreeBSD.org>	Revise the direct or optimized case to use uiomove_fromphys() by the reader instead of ephemeral mappings using pmap_qenter() by the writer. The writer is still, however, responsible for wiring the pages, just not mapping them. Consequently, the allocation of KVA for the direct case is unnecessary. Remove it and the sysctls limiting it, i.e., kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number of temporarily wired pages is still, however, limited by kern.ipc.maxpipekva. Note: On platforms lacking a direct virtual-to-physical mapping, uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus, the number of available sf_bufs can influence the performance of pipes on platforms such i386. Surprisingly, I saw the greatest gain from this change on such a machine: lmbench's pipe bandwidth result increased from ~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.
# 8f650450	13-Mar-2004	Peter Wemm <peter@FreeBSD.org>	Set default HZ to 1024 for amd64. The comment in kern/tty.c doesn't apply here because we have 64 bit longs and don't suffer the hz > 169 overflows.
# cebde069	10-Aug-2003	Mike Silbersack <silby@FreeBSD.org>	More pipe changes: From alc: Move pageable pipe memory to a seperate kernel submap to avoid awkward vm map interlocking issues. (Bad explanation provided by me.) From me: Rework pipespace accounting code to handle this new layout, and adjust our default values to account for the fact that we now have a solid limit on allocations. Also, remove the "maxpipes" limit, as it no longer has a purpose. (The limit on kva usage solves the problem of having two many pipes.)
# 347194c1	10-Jul-2003	Mike Silbersack <silby@FreeBSD.org>	Add init_param3() to subr_param. This function is called immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde
# 41f16f82	08-Jul-2003	Mike Silbersack <silby@FreeBSD.org>	Pull in the entire kmem_map size calculation from kern_malloc, rather than the shortcircuited version I had been using, which only worked properly on i386 & amd64. Also, change an autoscale constant to account for the more correct kmem_map size. Problem noticed by: mux
# 289016f2	07-Jul-2003	Mike Silbersack <silby@FreeBSD.org>	Put some concrete limits on pipe memory consumption: - Limit the total number of pipes so that we do not exhaust all vm objects in the kernel map. When this limit is reached, a ratelimited message will be printed to the console. - Put a soft limit on the amount of memory consumable by pipes. Once the limit has been reached, all new pipes will be limited to 4K in size, rather than the default of 16K. - Put a limit on the number of pages that may be used for high speed page flipping in order to reduce the amount of wired memory. Pipe writes that occur while this limit is exceeded will fall back to non-page flipping mode. The above values are auto-tuned in subr_param.c and are scaled to take into account both the size of physical memory and the size of the kernel map. These limits help to reduce the "kernel resources exhausted" panics that could be caused by opening a large number of pipes. (Pipes alone are no longer able to exhaust all resources, but other kernel memory hogs in league with pipes may still be able to do so.) PR: 53627 Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com MFC after: 1 week
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 447b3772	29-Aug-2002	Peter Wemm <peter@FreeBSD.org>	Change hw.physmem and hw.usermem to unsigned long like they used to be in the original hardwired sysctl implementation. The buf size calculator still overflows an integer on machines with large KVA (eg: ia64) where the number of pages does not fit into an int. Use 'long' there. Change Maxmem and physmem and related variables to 'long', mostly for completeness. Machines are not likely to overflow 'int' pages in the near term, but then again, 640K ought to be enough for anybody. This comes for free on 32 bit machines, so why not?
# e1d970f1	14-Apr-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Improve the implementation of adjtime(2). Apply the change as a continuous slew rather than as a series of discrete steps and make it possible to adjust arbitraryly huge amounts of time in either direction. In practice this is done by hooking into the same once-per-second loop as the NTP PLL and setting a suitable frequency offset deducting the amount slewed from the remainder. If the remaining delta is larger than 1 second we slew at 5000PPM (5msec/sec), for a delta less than a second we slew at 500PPM (500usec/sec) and for the last one second period we will slew at whatever rate (less than 500PPM) it takes to eliminate the delta entirely. The old implementation stepped the clock a number of microseconds every HZ to acheive the same effect, using the same rates of change. Eliminate the global variables tickadj, tickdelta and timedelta and their various use and initializations. This removes the most significant obstacle to running timecounter and NTP housekeeping from a timeout rather than hardclock.
# 77a7d074	06-Mar-2002	Mike Silbersack <silby@FreeBSD.org>	Unconditionally limit maxproc so that it is not possible to exhaust all kmaps. The only reward for setting maxproc to a value which will cause kmap exhaustion is a panic during a forkbomb attack. MFC after: 3 days
# 0b94a0e9	05-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	Allow the kern.maxusers boot tuneable to be set to 0 (previously only the kernel config's maxusers could be set to 0 for autosizing to work). Reviewed by: rwatson, imp MFC after: 3 days
# 4fbd563e	24-Jan-2002	Matthew Dillon <dillon@FreeBSD.org>	Make the 'maxusers 0' auto-sizing code slightly more conservative. Change from 1 megabyte of ram per user to 2 megabytes of ram per user, and reduce the cap from 512 to 384. 512 leaves around 240 MB of KVM available while 384 leaves 270 MB of KVM available. Available KVM is important in order to deal with zalloc and kernel malloc area growth. Reviewed by: mckusick MFC: either before 4.5 if re's agree, or after 4.5
# f6916f66	14-Dec-2001	Peter Wemm <peter@FreeBSD.org>	Proper fix for old config setting maxusers to 8.
# 7ca592e0	13-Dec-2001	Matthew Dillon <dillon@FreeBSD.org>	Too many people are compiling kernels with maxusers set to 0 without the new config. Hack the kernel to force auto-sizing if the old config is used.
# ebacce5e	13-Dec-2001	Mike Silbersack <silby@FreeBSD.org>	Limit maxprocperuid to 9/10 maxproc, and limit maxfilesperproc to 9/10 maxfiles. This should make local resource exhaustion attacks easier to handle with a non-tweaked setup. MFC after: 3 days
# 66a11b9f	08-Dec-2001	Matthew Dillon <dillon@FreeBSD.org>	Allow maxusers to be specified as 0 in the kernel config, which will cause the system to auto-size to between 32 and 512 depending on the amount of memory. MFC after: 1 week
# f2860039	13-Nov-2001	Matthew Dillon <dillon@FreeBSD.org>	Create a mutex pool API for short term leaf mutexes. Replace the manual mutex pool in kern_lock.c (lockmgr locks) with the new API. Replace the mutexes embedded in sxlocks with the new API.
# cbc89bfb	10-Oct-2001	Paul Saab <ps@FreeBSD.org>	Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks
# e1616f3a	20-Aug-2001	Matthew Dillon <dillon@FreeBSD.org>	Conditionalize VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX so MD sections that don't define these constants don't break.
# 2f9e4e80	19-Aug-2001	Matthew Dillon <dillon@FreeBSD.org>	Limit the amount of KVM reserved for the buffer cache and for swap-meta information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM. Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.
# ee342e1b	26-Jul-2001	Peter Wemm <peter@FreeBSD.org>	Move param.c out of the conf directory and make it fully dynamic. Tunables are now derived at boot time from maxusers. ie: change maxusers via a tunable and all the derivative settings change. You can change the other tunables individually as well. Even hz etc is tunable.
# 08442f8a	22-Jun-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/
# da936bf8	29-Oct-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unneeded <stddef.h> #includes.
# 9722d88f	12-Oct-2000	Jason Evans <jasone@FreeBSD.org>	For lockmgr mutex protection, use an array of mutexes that are allocated and initialized during boot. This avoids bloating sizeof(struct lock). As a side effect, it is no longer necessary to enforce the assumtion that lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been removed. Idea taken from: BSD/OS.
# ab063af9	01-May-2000	Peter Wemm <peter@FreeBSD.org>	Move the MSG* and SEM* options to opt_sysvipc.h Remove evil allocation macros from machdep.c (why was that there???) and use malloc() instead. Move paramters out of param.h and into the code itself. Move a bunch of internal definitions from public sys/.h headers (without #ifdef _KERNEL even) into the code itself. I had hoped to make some of this more dynamic, but the cost of doing wakeups on all sleeping processes on old arrays was too frightening. The other possibility is to initialize on the first use, and allow dynamic sysctl changes to parameters right until that point. That would allow /etc/rc.sysctl to change SEM and MSG* defaults as we presently do with SHM*, but without the nightmare of changing a running system.
# 255108f3	30-Mar-2000	Peter Wemm <peter@FreeBSD.org>	Make sysv-style shared memory tuneable params fully runtime adjustable via sysctl. It's done pretty simply but it should be quite adequate. Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that went with it were wrong... we don't allocate KVM space for the pages so that comment is bogus.. The only practical limit is how much physical ram you want to lock up as this stuff isn't paged out or swap backed.
# f48b807f	11-Dec-1999	Brian Feldman <green@FreeBSD.org>	This is Bosko Milekic's mbuf allocation waiting code. Basically, this means that running out of mbuf space isn't a panic anymore, and code which runs out of network memory will sleep to wait for it. Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: green, wollman
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 134c934c	05-Jul-1999	Mike Smith <msmith@FreeBSD.org>	Move the initialisation/tuning of nmbclusters from param.c/machdep.c into uipc_mbuf.c. This reduces three sets of identical tunable code to one set, and puts the initialisation with the mbuf code proper. Make NMBUFs tunable as well. Move the nmbclusters sysctl here as well. Move the initialisation of maxsockets from param.c to uipc_socket2.c, next to its corresponding sysctl. Use the new tunable macros for the kern.vm.kmem.size tunable (this should have been in a separate commit, whoops).
# 5a00f364	09-Apr-1999	Dag-Erling Smørgrav <des@FreeBSD.org>	Allow setting MAXFILES in the kernel config.
# 3ea57f9d	14-Dec-1998	Matthew Dillon <dillon@FreeBSD.org>	Fixed problems with kernel config file overrides of sysv semaphore parameters. Prior to this fix a kernel config override would effect only some of the kernel files, resulting in panics. PR: kern/9068
# dd0b2081	05-Nov-1998	David Greenman <dg@FreeBSD.org>	Implemented zero-copy TCP/IP extensions via sendfile(2) - send a file to a stream socket. sendfile(2) is similar to implementations in HP-UX, Linux, and other systems, but the API is more extensive and addresses many of the complaints that the Apache Group and others have had with those other implementations. Thanks to Marc Slemko of the Apache Group for helping me work out the best API for this. Anyway, this has the "net" result of speeding up sends of files over TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU cycles) when compared to a traditional read/write loop.
# bef7db2e	11-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Moved definition of fscale from param.c to kern_synch.c where it should always have been (it has no user-servicable parts even at compile time) and staticized it.
# 8cb52667	30-Jun-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Add 3 sysctl variables for future use by ps)1_
# df471779	20-Jun-1998	Bruce Evans <bde@FreeBSD.org>	Round tickadj up. This prevents tickadj from being 0 when HZ > 500, which makes adjtime(2) useless and confuses xntpd(8) into refusing to start even when it would use the kernel PLL instead of adjtime(). The result is the same as recommended by tickadj(8), at least when HZ divides 10^6. Of course, you wouldn't want to actually use adjtime() when HZ is large. In the silly boundary case of HZ == 10^6, tickadj == tick == 1 so the clock stops while adjtime() is active.
# 98271db4	15-May-1998	Garrett Wollman <wollman@FreeBSD.org>	Convert socket structures to be type-stable and add a version number. Define a parameter which indicates the maximum number of sockets in a system, and use this to size the zone allocators used for sockets and for certain PCBs. Convert PF_LOCAL PCB structures to be type-stable and add a version number. Define an external format for infomation about socket structures and use it in several places. Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through sysctl(3) without blocking network interrupts for an unreasonable length of time. This probably still has some bugs and/or race conditions, but it seems to work well enough on my machines. It is now possible for `netstat' to get almost all of its information via the sysctl(3) interface rather than reading kmem (changes to follow).
# d3c0af69	27-Feb-1998	Guido van Rooij <guido@FreeBSD.org>	Raise ncallout from NPROC + 16 to NPROC + 16 + MAXFILES. This shold prevent a possible DOS attack. The proper fix (to dynamically grow the callout list) is in the make. Submitted by: Paul Traina
# ac5fcb6d	14-Jun-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 5131d64e	16-Jan-1997	Bruce Evans <bde@FreeBSD.org>	Removed option EXTRAVNODES. All versions of FreeBSD-2.x have a sysctl variable `kern.maxvnodes' which gives much better control over vnode allocation than EXTRAVNODES (except in -current between 1995/10/28 and 1996/11/12, kern.maxvnodes was read-only and thus useless).
# 649c409d	15-Jan-1997	David Greenman <dg@FreeBSD.org>	Fix bug related to map entry allocations where a sleep might be attempted when allocating memory for network buffers at interrupt time. This is due to inadequate checking for the new mcl_map. Fixed by merging mb_map and mcl_map into a single mb_map. Reviewed by: wollman
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 114a8cff	30-May-1996	Peter Wemm <peter@FreeBSD.org>	Add an option "EXTRA_VNODES" to cause an extra number of vnode structures to be allocated at boot time. This is an expensive option, as they consume physical ram and are not pageable etc. In certain situations, this kind of option is quite useful, especially for news servers that access a large number of directories at random and torture the name cache. Defining 5000 or 10000 extra vnodes should cut down the amount of vnode recycling somewhat, which should allow better name and directory caching etc. This is a "your mileage may vary" option, with no real indication of what works best for your machine except trial and error. Too many will cost you ram that you could otherwise use for disk buffers etc. This is based on something John Dyson mentioned to me a while ago.
# cb7545a9	10-May-1996	Garrett Wollman <wollman@FreeBSD.org>	Allocate mbufs from a separate submap so that NMBCLUSTERS works as expected.
# e911eafc	02-May-1996	Poul-Henning Kamp <phk@FreeBSD.org>	removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG Major macro cleanup.
# f8845af0	02-May-1996	Poul-Henning Kamp <phk@FreeBSD.org>	First pass at cleaning up macros relating to pages, clusters and all that.
# 70b012ca	10-Mar-1996	Jeffrey Hsu <hsu@FreeBSD.org>	Merge in Lite2: proc LIST changes. Reviewed by: david & bde
# 4bd49128	02-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Add more options into the conf/options and i386/conf/options.i386 files and the #include hooks so that 'make depend' is more useful. This covers most of the options I regularly use (but not all) and some other easy ones.
# 50c73f36	04-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
# 154c04e5	10-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Last commit this round: Staticize. we are now down to about 1146 symbols being global, of which I estimate that about 100 are validly so.
# 28f8db14	29-Jul-1995	Bruce Evans <bde@FreeBSD.org>	Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
# 3d5e37c5	29-Jun-1995	David Greenman <dg@FreeBSD.org>	Removed "GATEWAY" consideration when calculating number of mbuf clusters. It now always uses the value that was used for the GATEWAY case.
# ac7e6123	29-Jun-1995	David Greenman <dg@FreeBSD.org>	Killed "TIMEZONE" and "DST" options. They have been forced to 0 by config for more than a year now. Moved the declaration of 'tz' into kern_time.c.
# cddc961a	25-May-1995	David Greenman <dg@FreeBSD.org>	Made "NMBCLUSTERS" calculation dynamic and fixed bogus use of "NMBCLUSTERS" in machdep.c (it should use the global nmbclusters). Moved the calculation of nmbclusters into conf/param.c (same place where nmbclusters has always been assigned), and made the calculation include an extra amount based on "maxusers". NMBCLUSTERS can still be overrided in the kernel config file as always, but this change will make that generally unnecessary. This fixes the "bug" reports from people who have misconfigured kernels seeing the network "hang" when the mbuf cluster pool runs out. Reviewed by: John Dyson
# e6373c9e	20-Feb-1995	Guido van Rooij <guido@FreeBSD.org>	Implement maxprocperuid and maxfilesperproc. They are tunable via sysctl(8). The initial value of maxprocperuid is maxproc-1, that of maxfilesperproc is maxfiles (untill maxfile will disappear) Now it is at least possible to prohibit one user opening maxfiles -Guido Submitted by: Obtained from:
# 3ab1adc5	16-Feb-1995	Joerg Wunsch <joerg@FreeBSD.org>	Alow overriding of the various SHM* options. Submitted by: Heikki Suonsivu <hsu@fx7.cs.hut.fi>
# ec2bb6ad	11-Jan-1995	David Greenman <dg@FreeBSD.org>	Increase maxfiles to NPROC*2. This makes the per-process open file limit highly bogus, however, and this needs to be fixed.
# 0d94caff	09-Jan-1995	David Greenman <dg@FreeBSD.org>	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
# 3d903220	13-Sep-1994	Doug Rabson <dfr@FreeBSD.org>	Added SYSV ipcs. Obtained from: NetBSD and FreeBSD-1.1.5
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources